Claude Sonnet 4.6: $3.00/MTok input — with 90% prompt caching discount (cache reads at $0.30/MTok). The flagship model for production AI apps. Complete pricing, comparison, and savings calculator.
| Token Type | Price (per 1M tokens) | Notes |
|---|---|---|
| Input (standard) | $3.00 | Your prompt + context |
| Output | $15.00 | Model-generated response |
| Cache write | $3.75 | 1.25× input price, one-time per TTL window |
| Cache read 90% off | $0.30 | 10% of standard input price — the key saving |
| Model | Input | Output | Cache Read | Context |
|---|---|---|---|---|
| Gemini 2.0 Flash | $0.10 | $0.40 | $0.025 | 1M tokens |
| GPT-4o-mini | $0.15 | $0.60 | $0.075 | 128k tokens |
| GPT-4o | $2.50 | $10.00 | $1.25 (50% off) | 128k tokens |
| Claude Sonnet 4.6 Best cache savings | $3.00 | $15.00 | $0.30 (90% off) | 200k tokens |
| Gemini 1.5 Pro | $3.50 | $10.50 | — | 2M tokens |
| Claude Opus 4.7 | $15.00 | $75.00 | $1.50 (90% off) | 200k tokens |
Claude Sonnet has a slightly higher sticker price than GPT-4o — but its 90% caching discount (vs GPT-4o's 50%) flips the equation for production apps with repeated context.
Example: A coding assistant with a 5,000-token system prompt, handling 30,000 daily queries (average 1,000 tokens user input, 500 tokens output):
| Model | System Prompt Cost/day (cached) | User input + output/day | Monthly Total |
|---|---|---|---|
| GPT-4o (50% cache) | $18.75 (cache read) | $225 input + $150 output | ~$11,800 |
| Claude Sonnet 4.6 (90% cache) Winner | $4.50 (cache read) | $90 input + $225 output | ~$9,600 |
The 90% caching discount saves ~$2,200/month vs GPT-4o for this scenario — despite Sonnet's slightly higher standard input rate.
Claude Sonnet is Anthropic's workhorse model — optimized for reliability, quality, and cost. The default choice for any production workload where Haiku underperforms and Opus's cost isn't justified. Most developers start with Sonnet and never need to upgrade.
Sonnet consistently outperforms GPT-4o on SWE-bench coding benchmarks. Strong at multi-file refactors, API integration, test generation, and debugging. Supports 200k context for large codebase tasks. Pairs with prompt caching for repeated code scaffolding.
200k context + 90% cache discount makes Sonnet ideal for RAG applications. Cache your document corpus once, then answer thousands of questions at $0.30/MTok instead of $3.00/MTok. Sonnet excels at synthesizing information across long documents.
Sonnet's instruction-following fidelity makes it reliable for tool-use agent loops. Better than Haiku at resolving ambiguous instructions and handling edge cases. Lower hallucination rate than GPT-4o on structured task completion. Supports Claude's computer-use tools.
| Scenario | Input Tokens | Output Tokens | Cost/Call (standard) | Cost/Call (cached input) |
|---|---|---|---|---|
| Code review | 1,500 | 600 | $0.0135 | $0.0045 (cached sys prompt) |
| Document Q&A | 4,000 | 400 | $0.018 | $0.0072 |
| Agent tool call | 2,000 | 200 | $0.009 | $0.003 |
| Long-form writing | 500 | 2,000 | $0.0315 | $0.0315 |
Paste your real prompt and see exact token costs across Claude Sonnet, Opus, Haiku, GPT-4o, and Gemini — with cache savings and monthly volume projections.
Open the LLM Pricing Calculator →Claude Sonnet 4.6 costs $3.00 per million input tokens and $15.00 per million output tokens. That's $0.003 per 1,000 input tokens and $0.015 per 1,000 output tokens. With prompt caching enabled, cache reads cost $0.30/MTok — a 90% discount that makes Sonnet highly competitive for repeated-context production workloads.
Claude Sonnet 4.6 is the current generation, succeeding the Claude 3.5 Sonnet series. The 4.x generation offers improved reasoning, better instruction following, and enhanced computer-use capabilities. Pricing is similar to the 3.5 generation: $3.00/MTok input, $15.00/MTok output. If you're migrating from claude-3-5-sonnet, update your model ID to claude-sonnet-4-6 to access the latest capabilities.
Start with Claude Haiku ($0.80/MTok) for high-volume tasks where speed and cost matter most. Upgrade to Sonnet ($3.00/MTok) when: output quality isn't meeting requirements, tasks require complex reasoning or nuanced instruction following, or you need better code generation. Many production apps use Haiku for 80% of traffic and Sonnet for complex or high-stakes queries — a routing pattern that cuts costs significantly.
Yes, Claude Sonnet 4.6 supports extended thinking mode — where the model reasons step-by-step before generating its response. Extended thinking tokens are billed as output tokens ($15.00/MTok), so they add cost but improve accuracy on complex reasoning tasks like math, logic puzzles, and multi-step planning. For most production apps, standard mode is sufficient and more cost-effective.
Claude Sonnet 4.6 has a 200,000-token context window — about 150,000 words or roughly 600 pages of text. This is significantly larger than GPT-4o's 128k context. Combined with prompt caching (cache your document corpus at 90% discount), Sonnet handles large-context RAG applications, codebase analysis, and long-document Q&A efficiently.
Also see: Claude Haiku Pricing · Claude Opus Pricing · GPT-4o vs Claude Cost · Gemini API Pricing · LLM Cost Comparison 2026