Gemini 2.0 Flash, 1.5 Pro & Ultra — cost per token, context caching, and comparison vs Claude & GPT-4o
| Model | Input (per 1M) | Output (per 1M) | Cache Read | Context |
|---|---|---|---|---|
| Gemini 2.0 Flash Cheapest | $0.10 | $0.40 | $0.025 | 1M tokens |
| Gemini 2.0 Flash-Lite | $0.075 | $0.30 | $0.019 | 1M tokens |
| Gemini 1.5 Pro (≤128k) | $1.25 | $5.00 | $0.3125 | 1M tokens |
| Gemini 1.5 Pro (>128k) | $2.50 | $10.00 | $0.625 | 1M tokens |
| Gemini 1.5 Flash | $0.075 | $0.30 | $0.019 | 1M tokens |
| Model | Provider | Input (per 1M) | Output (per 1M) | Best For |
|---|---|---|---|---|
| Gemini 2.0 Flash Cheapest | $0.10 | $0.40 | High-volume, cost-sensitive tasks | |
| Claude Haiku 4.5 | Anthropic | $0.80 | $4.00 | Cheap Claude with 90% cache discount |
| GPT-4o-mini | OpenAI | $0.15 | $0.60 | Budget OpenAI with simple tasks |
| Claude Sonnet 4.6 Best value | Anthropic | $3.00 | $15.00 | Balanced quality + cost |
| GPT-4o | OpenAI | $2.50 | $10.00 | OpenAI ecosystem apps |
| Gemini 1.5 Pro 1M context | $1.25 | $5.00 | Massive context (whole codebases) |
Gemini 2.0 Flash at $0.10/MTok input is 25× cheaper than Claude Sonnet and GPT-4o. For classification, extraction, summarization at millions of calls/day, Flash dramatically cuts costs.
Gemini 1.5 Pro supports 1 million token context — the largest of any production LLM. Useful for entire codebases, legal document analysis, or multi-book research without chunking.
Gemini's context caching gives a 75% discount on repeated inputs. For chatbots with large system prompts, the effective cost is ~$0.025/MTok — very competitive with Claude's 90% cache discount.
Assume: 300 input tokens + 100 output tokens per call (simple summarization task).
| Model | Monthly Input Cost | Monthly Output Cost | Total/Month |
|---|---|---|---|
| Gemini 2.0 Flash | $30 | $40 | $70 |
| GPT-4o-mini | $45 | $60 | $105 |
| Claude Haiku 4.5 | $240 | $400 | $640 |
| Claude Sonnet 4.6 | $900 | $1,500 | $2,400 |
| GPT-4o | $750 | $1,000 | $1,750 |
Paste your real prompt and instantly compare Gemini, Claude, and GPT-4o costs with monthly volume projections.
Open the LLM Pricing Calculator →Gemini 2.0 Flash costs $0.10 per million input tokens and $0.40 per million output tokens. At these prices it's the cheapest production-grade LLM available — roughly 25× cheaper than Claude Sonnet 4.6 on input, and 17× cheaper than GPT-4o on output.
Yes, significantly. Gemini 2.0 Flash is the cheapest among major LLM APIs. The tradeoff: Claude Sonnet and GPT-4o score higher on reasoning benchmarks, instruction following, and complex code generation. For simpler high-volume tasks (classification, extraction, summarization), Gemini Flash offers far better cost-efficiency.
Yes. Google's AI Studio provides a free tier for Gemini Flash with rate limits (15 requests/minute, 1 million tokens/minute). Once you enable billing, you access full production quotas. The free tier is generous enough for prototyping and low-volume production apps.
Gemini 1.5 Pro supports up to 1 million tokens of context — the largest of any generally available LLM. This allows processing entire codebases, large PDFs, or multi-hour audio transcripts in a single request. At $1.25/MTok input (≤128k) it's competitively priced for long-context tasks.
Gemini's context caching stores a portion of your prompt server-side. Cache read tokens cost $0.025/MTok for Flash — a 75% discount vs the standard $0.10/MTok rate. Cache writes have no surcharge (unlike Claude's 1.25× cache write fee). Cache TTL defaults to 1 hour but can be configured up to 48 hours.
Gemini 1.5 Pro wins on context size (1M tokens vs Claude's 200k), and at $1.25/MTok it's cheaper than Claude Sonnet ($3/MTok) for input. However, Claude's 200k context with prompt caching can process large documents at $0.30/MTok (10% of $3) for cached content — competitive with Gemini's cached rate of $0.3125/MTok for 1.5 Pro.
Also see: GPT-4o vs Claude Cost · Claude vs GPT Pricing · OpenAI API Cost Calculator · LLM Cost Comparison 2026 · Claude Haiku Pricing