Claude Haiku Pricing 2026

Claude Haiku 4.5: $0.80/MTok input — with 90% prompt caching discount (cache reads at $0.08/MTok). Complete pricing, use cases, and comparison vs alternatives.

Claude Haiku 4.5 — Full Pricing Breakdown

Token Type Price (per 1M tokens) Notes
Input (standard) $0.80 Your prompt + context
Output $4.00 Model-generated response
Cache write $1.00 1.25× input price, one-time per TTL window
Cache read 90% off $0.08 10% of standard input price — the key saving

Claude Haiku vs Budget Model Alternatives

Model Input Output Cache Read Context
Gemini 2.0 Flash Cheapest raw $0.10 $0.40 $0.025 (75% off) 1M tokens
GPT-4o-mini $0.15 $0.60 $0.075 (50% off) 128k tokens
Claude Haiku 4.5 Best cache savings $0.80 $4.00 $0.08 (90% off) 200k tokens
Claude Sonnet 4.6 $3.00 $15.00 $0.30 (90% off) 200k tokens
GPT-4o $2.50 $10.00 $1.25 (50% off) 128k tokens

When Claude Haiku Beats GPT-4o-mini on Cost

Claude Haiku has a higher sticker price than GPT-4o-mini — but prompt caching flips the equation for apps with repeated context.

Example: A customer support chatbot with a 3,000-token system prompt, handling 20,000 conversations/day (10 turns each = 200,000 turns/day):

Model System Prompt Cost/day Total Daily Cost Monthly Cost
GPT-4o-mini (50% cache) $0.045 (cache read) ~$150 ~$4,500
Claude Haiku (90% cache) Winner $0.024 (cache read) ~$110 ~$3,300

The 90% cache discount more than offsets Haiku's higher standard rate — resulting in ~27% lower total cost for repeated-context workloads.

Best Use Cases for Claude Haiku

Customer support chatbots

Large system prompts (company FAQs, product docs) cached at $0.08/MTok. Haiku's instruction following quality is strong for support tasks. 90% cache discount makes it cost-efficient at scale.

Classification & extraction

Categorize tickets, extract structured data from text, classify sentiment. Haiku handles structured output tasks reliably. High-volume processing at 0.08¢ per 1,000 cached input tokens.

Multi-turn agents

Agents that reuse large tool descriptions and memory blocks benefit most from caching. Haiku's 200k context handles complex agent state. Cache TTL resets naturally with each conversation turn.

Real-time applications

Claude Haiku is Anthropic's fastest model — sub-second response times for short outputs. For user-facing apps where latency matters and tasks are straightforward, Haiku delivers Claude quality at speed.

Claude Haiku Cost Calculator — Quick Reference

Scenario Input Tokens Output Tokens Cost/Call (standard) Cost/Call (cached input)
Simple classification 200 10 $0.00016 $0.000016
Support chat turn 500 150 $0.001 $0.00064
Document summary 2,000 300 $0.0028 $0.00136
Code review (short) 1,000 400 $0.0024 $0.00168

Calculate Your Actual Claude Haiku Cost

Paste your real prompt and see exact costs across Claude Haiku, Claude Sonnet, GPT-4o-mini, and Gemini — with cache savings and monthly volume projections.

Open the LLM Pricing Calculator →

Frequently Asked Questions

How much does Claude Haiku cost per token?

Claude Haiku 4.5 costs $0.80 per million input tokens and $4.00 per million output tokens. That's $0.0008 per 1,000 input tokens and $0.004 per 1,000 output tokens. With prompt caching enabled, cache reads cost $0.08/MTok — a 90% discount that makes Haiku extremely cost-effective for repeated-context applications.

What is the difference between Claude Haiku and Claude Sonnet?

Claude Haiku 4.5 is Anthropic's fastest and cheapest model — optimized for high-volume, latency-sensitive tasks. Claude Sonnet 4.6 is the mid-tier model with significantly better reasoning, instruction following, and code quality. Haiku costs $0.80/MTok vs Sonnet's $3.00/MTok. Use Haiku for straightforward tasks; upgrade to Sonnet when output quality isn't sufficient.

Does Claude Haiku support prompt caching?

Yes. Claude Haiku 4.5 fully supports Anthropic's prompt caching feature. Mark cache breakpoints in your API request, and subsequent calls within the cache TTL window (typically 5 minutes, configurable up to longer) pay only $0.08/MTok for cached input — 90% less than the standard $0.80/MTok. Cache writes cost $1.00/MTok (one-time per TTL window).

How long does Claude Haiku prompt cache last?

Claude prompt cache TTL defaults to 5 minutes. If you're building a high-traffic app, the cache stays warm as long as requests arrive within 5-minute windows. For low-traffic apps or overnight periods, the cache may expire and need to be rewritten on the next call (at the $1.00/MTok write cost). Anthropic may offer longer TTL options for enterprise customers.

Is Claude Haiku good enough for production apps?

Yes, for many use cases. Claude Haiku performs well on: customer support responses, content classification, structured data extraction, summarization, simple Q&A, and code explanations. It underperforms vs Sonnet on: complex reasoning chains, nuanced creative writing, advanced code generation, and tasks requiring careful instruction parsing. Test your specific task before assuming you need Sonnet.

Also see: GPT-4o vs Claude Cost · Claude vs GPT Pricing · Gemini API Pricing · LLM Cost Comparison 2026 · OpenAI API Cost Calculator