Free · No Signup

LLM API Pricing Calculator

Paste any prompt — instantly compare token counts & API costs across Claude, GPT-4o, and Gemini.

● Anthropic Claude ● OpenAI GPT-4 ● Google Gemini
Your Prompt 0 tokens
Try example:
Expected output length 50%
= 0 output tokens
Requests per day 100
for monthly cost estimate
💡
Anthropic Claude
✍️

Paste a prompt above
to see costs

Results appear
instantly

💰

All 7 major
models compared

OpenAI GPT-4
🤖

Enter a prompt
to compare

🚀

Across Claude,
GPT-4 & Gemini

Google Gemini
🔬

Token-accurate
cost comparison

Including cache
discount math

Current LLM Pricing Reference (per million tokens)
ModelInputOutputCache ReadContext
Anthropic
Claude Opus 4.7 $15.00$75.00 $1.50200k
Claude Sonnet 4.6 Popular $3.00$15.00 $0.30200k
Claude Haiku 4.5 $0.80$4.00 $0.08200k
OpenAI
GPT-4o Popular $2.50$10.00 $1.25128k
GPT-4o-mini Budget $0.15$0.60 $0.075128k
Google
Gemini 1.5 Pro $1.25$5.00 $0.31251M+
Gemini 2.0 Flash Cheapest $0.10$0.40 $0.0251M+
Cache read price shown (write is typically 1.25x input). Prices are approximate — verify at provider sites before billing decisions. Anthropic ↗ OpenAI ↗ Google ↗

Frequently Asked Questions

How does LLM API pricing work?
All major LLMs charge per million tokens (MTok) — separately for input (your prompt) and output (the response). Claude Sonnet 4.6 costs $3/MTok input, GPT-4o costs $2.50/MTok, and Gemini 2.0 Flash costs just $0.10/MTok. Output tokens typically cost 4–5× more than input tokens. This tool estimates per-request and monthly costs based on your prompt length and usage volume.
Which LLM API is cheapest?
Gemini 2.0 Flash is currently the cheapest at $0.10/MTok input. GPT-4o-mini ($0.15/MTok) and Claude Haiku 4.5 ($0.80/MTok) are also very affordable. For the best balance of cost and quality, Claude Sonnet 4.6 ($3/MTok) and GPT-4o ($2.50/MTok) are popular. Use this calculator to see exact costs for your specific prompt.
Claude vs GPT-4o — which is cheaper?
GPT-4o ($2.50/MTok input) is slightly cheaper than Claude Sonnet 4.6 ($3.00/MTok) at the base level. However, Claude's prompt caching gives a 90% discount on cached input tokens — making Claude significantly cheaper for apps with large, repeated system prompts. For pure cost, Gemini 2.0 Flash beats both at $0.10/MTok input.
What is Claude prompt caching and how much does it save?
Prompt caching lets Claude reuse a stored version of your system prompt. The first request incurs a cache write surcharge (1.25× input price). Every subsequent request within the TTL window uses cache read pricing — just 10% of normal input cost, a 90% discount. For high-volume apps with large system prompts, this cuts input costs by 80–90%. OpenAI also offers prompt caching at 50% off for GPT-4o. See Anthropic's prompt caching docs.
How accurate is the token count?
This tool uses a 4 characters-per-token approximation — the standard heuristic most developers use for rough estimates. Actual tokenization may vary 5–10% depending on content type and the specific model's tokenizer. English prose is close to 4 chars/token; code and non-Latin scripts may differ. For precise Claude counts, use the Anthropic tokenizer.
How do I reduce my LLM API bill?
1. Enable prompt caching — Claude saves 90% on repeated system prompts; GPT-4o saves 50%.
2. Downgrade model for simple tasks — Gemini 2.0 Flash is 150× cheaper than Claude Opus per input token.
3. Shorten your system prompt — every uncached token costs money on every request.
4. Limit output length — use max_tokens to cap response length where appropriate.
5. Use Batch API — Anthropic's Message Batches API offers 50% off for async workloads.

FAQ

How do I estimate the cost of a Claude API request?
Paste your prompt above. The tool tokenises it client-side and multiplies by each model’s input/output rate, producing a per-request cost for Claude Opus 4.7, Sonnet 4.6, Haiku 4.5, GPT-4o and Gemini side by side.
Which LLM is cheapest for high-volume requests?
For short prompts and short outputs, Claude Haiku 4.5 and Gemini Flash are typically cheapest. For long-context or heavily-cached workloads, Claude Sonnet 4.6 with prompt caching often wins on cost-per-quality.
Does this tool store or send my prompts anywhere?
No. Tokenisation and cost math run entirely in your browser. Your prompt text is never POSTed to a server. The share link encodes the prompt into the URL fragment client-side.