Every major AI API ranked by price per token — Gemini, Claude, GPT-4o, Mistral. Find the cheapest LLM for your workload.
| Rank | Model | Provider | Input (per 1M) | Output (per 1M) | Context |
|---|---|---|---|---|---|
| 1 | Gemini 2.0 Flash-Lite Cheapest | $0.075 | $0.30 | 1M tokens | |
| 2 | Gemini 2.0 Flash | $0.10 | $0.40 | 1M tokens | |
| 3 | GPT-4o-mini | OpenAI | $0.15 | $0.60 | 128k tokens |
| 4 | Claude Haiku 4.5 90% cache discount | Anthropic | $0.80 | $4.00 | 200k tokens |
| 5 | Gemini 1.5 Flash | $0.075 | $0.30 | 1M tokens | |
| 6 | Gemini 1.5 Pro (≤128k) | $1.25 | $5.00 | 1M tokens | |
| 7 | GPT-4o Popular | OpenAI | $2.50 | $10.00 | 128k tokens |
| 8 | Claude Sonnet 4.6 Best quality/cost | Anthropic | $3.00 | $15.00 | 200k tokens |
| 9 | Claude Opus 4.7 Most capable | Anthropic | $15.00 | $75.00 | 200k tokens |
Prompt caching fundamentally changes the cost hierarchy. For apps that reuse a large system prompt (agents, chatbots, RAG), effective input costs drop significantly:
| Model | Standard Input | Cached Input | Savings |
|---|---|---|---|
| Claude Haiku 4.5 (cache read) | $0.80/MTok | $0.08/MTok | 90% off |
| Claude Sonnet 4.6 (cache read) | $3.00/MTok | $0.30/MTok | 90% off |
| GPT-4o (cached) | $2.50/MTok | $1.25/MTok | 50% off |
| GPT-4o-mini (cached) | $0.15/MTok | $0.075/MTok | 50% off |
| Gemini 2.0 Flash (cache) | $0.10/MTok | $0.025/MTok | 75% off |
Gemini 2.0 Flash or Flash-Lite. Best for classification, extraction, summarization at high volume. Quality is surprisingly good for structured tasks.
Claude Haiku with caching (effective ~$0.08–0.80/MTok), or GPT-4o-mini. Good for agents and chatbots with repeated context. Claude wins on cache depth.
Claude Sonnet 4.6 or GPT-4o. Best for complex reasoning, code generation, and instruction following. Claude Sonnet has a slight quality edge for coding tasks.
Claude Opus 4.7 for the most complex autonomous reasoning and agentic tasks. Only use when Sonnet-level quality genuinely isn't sufficient.
Scenario: 100,000 calls/day, 500 input tokens + 200 output tokens each = 50M input + 20M output tokens/day.
| Model | Daily Cost | Monthly Cost | vs GPT-4o |
|---|---|---|---|
| Gemini 2.0 Flash | $13 | $390 | 96% cheaper |
| GPT-4o-mini | $19.50 | $585 | 94% cheaper |
| Claude Haiku 4.5 | $120 | $3,600 | 74% cheaper |
| GPT-4o | $325 | $9,750 | baseline |
| Claude Sonnet 4.6 | $450 | $13,500 | 38% more |
Paste your actual prompt and see costs across all models — with monthly projections and cache savings estimates.
Open the LLM Cost Calculator →Gemini 2.0 Flash-Lite at $0.075/MTok input is currently the cheapest production-grade LLM API. For self-hosted or batch workloads, Mistral 7B via providers like Groq or Fireworks can be even cheaper, but requires more engineering overhead. Among hosted APIs, Gemini Flash is the clear cost leader.
Avoid budget models for: complex multi-step reasoning, code generation with correctness requirements, long-context synthesis, safety-critical applications, or tasks requiring precise instruction following. The quality gap between Gemini Flash and Claude Sonnet is minimal for simple tasks but significant for complex ones — measure quality on your specific task before committing to a cheap model at scale.
Model routing sends simple queries to cheap models (Gemini Flash, GPT-4o-mini) and complex queries to powerful models (Claude Sonnet, GPT-4o). A classifier determines query complexity upfront. This pattern typically reduces bills by 50–80% with minimal quality degradation, since 70–80% of production queries are simple enough for cheap models.
Yes. LLM API prices have dropped 10–20× over the past 2 years. GPT-4-class capabilities that cost $30/MTok in 2023 cost $3/MTok today. This trend is expected to continue as model efficiency improves and competition increases — especially from Google (Gemini) and open-source alternatives. Budget accordingly: what seems expensive today may be cheap within 12 months.
Input tokens are what you send to the model (your prompt, context, system instructions). Output tokens are what the model generates (the response). Output tokens cost 3–5× more than input tokens across most providers. For cost optimization: minimize output length where possible (use structured formats, concise instructions), and use prompt caching to reduce repeated input costs.
Also see: GPT-4o vs Claude Cost · Claude vs GPT Pricing · Gemini API Pricing · OpenAI API Cost Calculator · Claude Haiku Pricing