Gemini API Pricing 2026

Gemini 2.0 Flash, 1.5 Pro & Ultra — cost per token, context caching, and comparison vs Claude & GPT-4o

Gemini API Pricing Table (per million tokens)

Model Input (per 1M) Output (per 1M) Cache Read Context
Gemini 2.0 Flash Cheapest $0.10 $0.40 $0.025 1M tokens
Gemini 2.0 Flash-Lite $0.075 $0.30 $0.019 1M tokens
Gemini 1.5 Pro (≤128k) $1.25 $5.00 $0.3125 1M tokens
Gemini 1.5 Pro (>128k) $2.50 $10.00 $0.625 1M tokens
Gemini 1.5 Flash $0.075 $0.30 $0.019 1M tokens

Gemini vs Claude vs GPT-4o — Full Comparison

Model Provider Input (per 1M) Output (per 1M) Best For
Gemini 2.0 Flash Cheapest Google $0.10 $0.40 High-volume, cost-sensitive tasks
Claude Haiku 4.5 Anthropic $0.80 $4.00 Cheap Claude with 90% cache discount
GPT-4o-mini OpenAI $0.15 $0.60 Budget OpenAI with simple tasks
Claude Sonnet 4.6 Best value Anthropic $3.00 $15.00 Balanced quality + cost
GPT-4o OpenAI $2.50 $10.00 OpenAI ecosystem apps
Gemini 1.5 Pro 1M context Google $1.25 $5.00 Massive context (whole codebases)

Key Insights: When to Use Gemini

Best for ultra-high volume

Gemini 2.0 Flash at $0.10/MTok input is 25× cheaper than Claude Sonnet and GPT-4o. For classification, extraction, summarization at millions of calls/day, Flash dramatically cuts costs.

Best for 1M token context

Gemini 1.5 Pro supports 1 million token context — the largest of any production LLM. Useful for entire codebases, legal document analysis, or multi-book research without chunking.

Context caching advantage

Gemini's context caching gives a 75% discount on repeated inputs. For chatbots with large system prompts, the effective cost is ~$0.025/MTok — very competitive with Claude's 90% cache discount.

Real-World Cost: 1M API Calls/Month

Assume: 300 input tokens + 100 output tokens per call (simple summarization task).

Model Monthly Input Cost Monthly Output Cost Total/Month
Gemini 2.0 Flash $30 $40 $70
GPT-4o-mini $45 $60 $105
Claude Haiku 4.5 $240 $400 $640
Claude Sonnet 4.6 $900 $1,500 $2,400
GPT-4o $750 $1,000 $1,750

Calculate Your Actual Gemini Cost

Paste your real prompt and instantly compare Gemini, Claude, and GPT-4o costs with monthly volume projections.

Open the LLM Pricing Calculator →

Frequently Asked Questions

How much does Gemini 2.0 Flash cost per token?

Gemini 2.0 Flash costs $0.10 per million input tokens and $0.40 per million output tokens. At these prices it's the cheapest production-grade LLM available — roughly 25× cheaper than Claude Sonnet 4.6 on input, and 17× cheaper than GPT-4o on output.

Is Gemini cheaper than Claude or GPT-4o?

Yes, significantly. Gemini 2.0 Flash is the cheapest among major LLM APIs. The tradeoff: Claude Sonnet and GPT-4o score higher on reasoning benchmarks, instruction following, and complex code generation. For simpler high-volume tasks (classification, extraction, summarization), Gemini Flash offers far better cost-efficiency.

Does Gemini offer free API usage?

Yes. Google's AI Studio provides a free tier for Gemini Flash with rate limits (15 requests/minute, 1 million tokens/minute). Once you enable billing, you access full production quotas. The free tier is generous enough for prototyping and low-volume production apps.

What is Gemini 1.5 Pro's context window?

Gemini 1.5 Pro supports up to 1 million tokens of context — the largest of any generally available LLM. This allows processing entire codebases, large PDFs, or multi-hour audio transcripts in a single request. At $1.25/MTok input (≤128k) it's competitively priced for long-context tasks.

How does Gemini's context caching work?

Gemini's context caching stores a portion of your prompt server-side. Cache read tokens cost $0.025/MTok for Flash — a 75% discount vs the standard $0.10/MTok rate. Cache writes have no surcharge (unlike Claude's 1.25× cache write fee). Cache TTL defaults to 1 hour but can be configured up to 48 hours.

Gemini vs Claude for long documents — which is better?

Gemini 1.5 Pro wins on context size (1M tokens vs Claude's 200k), and at $1.25/MTok it's cheaper than Claude Sonnet ($3/MTok) for input. However, Claude's 200k context with prompt caching can process large documents at $0.30/MTok (10% of $3) for cached content — competitive with Gemini's cached rate of $0.3125/MTok for 1.5 Pro.

Also see: GPT-4o vs Claude Cost · Claude vs GPT Pricing · OpenAI API Cost Calculator · LLM Cost Comparison 2026 · Claude Haiku Pricing