Claude Haiku 4.5: $0.80/MTok input — with 90% prompt caching discount (cache reads at $0.08/MTok). Complete pricing, use cases, and comparison vs alternatives.
| Token Type | Price (per 1M tokens) | Notes |
|---|---|---|
| Input (standard) | $0.80 | Your prompt + context |
| Output | $4.00 | Model-generated response |
| Cache write | $1.00 | 1.25× input price, one-time per TTL window |
| Cache read 90% off | $0.08 | 10% of standard input price — the key saving |
| Model | Input | Output | Cache Read | Context |
|---|---|---|---|---|
| Gemini 2.0 Flash Cheapest raw | $0.10 | $0.40 | $0.025 (75% off) | 1M tokens |
| GPT-4o-mini | $0.15 | $0.60 | $0.075 (50% off) | 128k tokens |
| Claude Haiku 4.5 Best cache savings | $0.80 | $4.00 | $0.08 (90% off) | 200k tokens |
| Claude Sonnet 4.6 | $3.00 | $15.00 | $0.30 (90% off) | 200k tokens |
| GPT-4o | $2.50 | $10.00 | $1.25 (50% off) | 128k tokens |
Claude Haiku has a higher sticker price than GPT-4o-mini — but prompt caching flips the equation for apps with repeated context.
Example: A customer support chatbot with a 3,000-token system prompt, handling 20,000 conversations/day (10 turns each = 200,000 turns/day):
| Model | System Prompt Cost/day | Total Daily Cost | Monthly Cost |
|---|---|---|---|
| GPT-4o-mini (50% cache) | $0.045 (cache read) | ~$150 | ~$4,500 |
| Claude Haiku (90% cache) Winner | $0.024 (cache read) | ~$110 | ~$3,300 |
The 90% cache discount more than offsets Haiku's higher standard rate — resulting in ~27% lower total cost for repeated-context workloads.
Large system prompts (company FAQs, product docs) cached at $0.08/MTok. Haiku's instruction following quality is strong for support tasks. 90% cache discount makes it cost-efficient at scale.
Categorize tickets, extract structured data from text, classify sentiment. Haiku handles structured output tasks reliably. High-volume processing at 0.08¢ per 1,000 cached input tokens.
Agents that reuse large tool descriptions and memory blocks benefit most from caching. Haiku's 200k context handles complex agent state. Cache TTL resets naturally with each conversation turn.
Claude Haiku is Anthropic's fastest model — sub-second response times for short outputs. For user-facing apps where latency matters and tasks are straightforward, Haiku delivers Claude quality at speed.
| Scenario | Input Tokens | Output Tokens | Cost/Call (standard) | Cost/Call (cached input) |
|---|---|---|---|---|
| Simple classification | 200 | 10 | $0.00016 | $0.000016 |
| Support chat turn | 500 | 150 | $0.001 | $0.00064 |
| Document summary | 2,000 | 300 | $0.0028 | $0.00136 |
| Code review (short) | 1,000 | 400 | $0.0024 | $0.00168 |
Paste your real prompt and see exact costs across Claude Haiku, Claude Sonnet, GPT-4o-mini, and Gemini — with cache savings and monthly volume projections.
Open the LLM Pricing Calculator →Claude Haiku 4.5 costs $0.80 per million input tokens and $4.00 per million output tokens. That's $0.0008 per 1,000 input tokens and $0.004 per 1,000 output tokens. With prompt caching enabled, cache reads cost $0.08/MTok — a 90% discount that makes Haiku extremely cost-effective for repeated-context applications.
Claude Haiku 4.5 is Anthropic's fastest and cheapest model — optimized for high-volume, latency-sensitive tasks. Claude Sonnet 4.6 is the mid-tier model with significantly better reasoning, instruction following, and code quality. Haiku costs $0.80/MTok vs Sonnet's $3.00/MTok. Use Haiku for straightforward tasks; upgrade to Sonnet when output quality isn't sufficient.
Yes. Claude Haiku 4.5 fully supports Anthropic's prompt caching feature. Mark cache breakpoints in your API request, and subsequent calls within the cache TTL window (typically 5 minutes, configurable up to longer) pay only $0.08/MTok for cached input — 90% less than the standard $0.80/MTok. Cache writes cost $1.00/MTok (one-time per TTL window).
Claude prompt cache TTL defaults to 5 minutes. If you're building a high-traffic app, the cache stays warm as long as requests arrive within 5-minute windows. For low-traffic apps or overnight periods, the cache may expire and need to be rewritten on the next call (at the $1.00/MTok write cost). Anthropic may offer longer TTL options for enterprise customers.
Yes, for many use cases. Claude Haiku performs well on: customer support responses, content classification, structured data extraction, summarization, simple Q&A, and code explanations. It underperforms vs Sonnet on: complex reasoning chains, nuanced creative writing, advanced code generation, and tasks requiring careful instruction parsing. Test your specific task before assuming you need Sonnet.
Also see: GPT-4o vs Claude Cost · Claude vs GPT Pricing · Gemini API Pricing · LLM Cost Comparison 2026 · OpenAI API Cost Calculator