Compare Claude, GPT-4o, Gemini, DeepSeek, and Mistral — input/output costs, cached rates, context windows, and best-value picks for every use case.
All prices per million tokens (MTok). Cached rates apply after the first request within the cache TTL window.
| Model | Provider | Input ($/MTok) | Output ($/MTok) | Cached Input | Context |
|---|---|---|---|---|---|
| Gemini 2.0 Flash | $0.10 | $0.40 | $0.025 | 1M tokens | |
| GPT-4o-mini | OpenAI | $0.15 | $0.60 | $0.075 | 128K tokens |
| DeepSeek-V3 | DeepSeek | $0.27 | $1.10 | $0.07 | 128K tokens |
| Mistral Medium | Mistral | $0.40 | $2.00 | — | 128K tokens |
| Claude Haiku 4.5 | Anthropic | $0.80 | $4.00 | $0.08 | 200K tokens |
| Gemini 2.5 Pro | $1.25 | $5.00 | $0.31 | 1M tokens | |
| GPT-4o | OpenAI | $2.50 | $10.00 | $1.25 | 128K tokens |
| Claude Sonnet 4.6 | Anthropic | $3.00 | $15.00 | $0.30 | 200K tokens |
| OpenAI o4-mini | OpenAI | $1.10 | $4.40 | $0.275 | 200K tokens |
| OpenAI o3 | OpenAI | $10.00 | $40.00 | $2.50 | 200K tokens |
| Claude Opus 4.7 | Anthropic | $15.00 | $75.00 | $1.50 | 200K tokens |
Gemini 2.0 Flash — $0.10/MTok input, 1M context. Best raw tokens per dollar for non-critical workloads.
Claude Haiku 4.5 with caching — $0.08/MTok cached reads. Repeated context makes it the cheapest option at scale.
Claude Sonnet 4.6 — $3.00/MTok input. Consistently the top performer on coding and reasoning benchmarks at non-frontier pricing.
Gemini 2.5 Pro — $1.25/MTok input, 1M context window. 4× cheaper than Sonnet for long-document work.
DeepSeek-V3 — $0.27/MTok API or self-host for near-zero marginal cost. Strong benchmark performance relative to price.
OpenAI o4-mini — $1.10/MTok input for reasoning tasks. Better than GPT-4o on hard math/coding at lower price.
Anthropic and OpenAI both offer 50% batch discounts for async workloads (processing within 24h). Google Gemini does not yet have a batch discount tier as of 2026.
| Model | Standard Input | Batch Input | Batch Output |
|---|---|---|---|
| Claude Haiku 4.5 | $0.80/MTok | $0.40/MTok | $2.00/MTok |
| Claude Sonnet 4.6 | $3.00/MTok | $1.50/MTok | $7.50/MTok |
| GPT-4o-mini | $0.15/MTok | $0.075/MTok | $0.30/MTok |
| GPT-4o | $2.50/MTok | $1.25/MTok | $5.00/MTok |
Paste any prompt into the calculator to see what it costs across all major LLM APIs — including cached rates and monthly projections.
Gemini 2.0 Flash at $0.10/MTok input is the cheapest capable hosted model. DeepSeek-V3 at $0.27/MTok offers stronger benchmark performance at very low cost. For instruction-following with strong quality guarantees, GPT-4o-mini ($0.15/MTok) and Claude Haiku 4.5 with caching ($0.08/MTok cached) are the best budget options.
Claude Sonnet 4.6 ($3.00/MTok input, $15.00/MTok output) is slightly more expensive than GPT-4o ($2.50/MTok input, $10.00/MTok output) at standard rates. For long-context apps with caching, Sonnet cache reads ($0.30/MTok) are cheaper than GPT-4o's cached rate. Both models are competitive on quality for most tasks.
Yes — Gemini 2.5 Pro ($1.25/MTok input) is 58% cheaper than Claude Sonnet 4.6 ($3.00/MTok) with a 1M token context window vs Sonnet's 200K. For long-document analysis and cost-sensitive apps, Gemini 2.5 Pro is a strong alternative. Claude Sonnet still leads on instruction-following precision and agentic task quality for many teams.
Claude (Anthropic) has the most aggressive caching pricing: cache reads at 10% of the standard input rate (e.g., $0.08/MTok for Haiku). OpenAI's cached input rate is 50% of standard. Google Gemini offers context caching with an hourly storage charge model. For apps with large, fixed system prompts, Claude's caching gives the biggest savings.
Yes — frontier model prices have dropped 10-100× over the past two years. GPT-4-level capability (as of 2023) now costs under $0.50/MTok via several providers. Competition from Google, Anthropic, DeepSeek, and Mistral continues to push prices down. Budget at today's rates but expect further decreases as scale increases.