Your Prompt
0 tokens
Try example:
Expected output length
50%
= 0 output tokens
Requests per day
100
for monthly cost estimate
Anthropic Claude
✍️
Paste a prompt above
to see costs
⚡
Results appear
instantly
💰
All 7 major
models compared
OpenAI GPT-4
🤖
Enter a prompt
to compare
🚀
Across Claude,
GPT-4 & Gemini
Google Gemini
🔬
Token-accurate
cost comparison
⚡
Including cache
discount math
Current LLM Pricing Reference (per million tokens)
| Model | Input | Output | Cache Read | Context |
|---|---|---|---|---|
| Anthropic | ||||
| Claude Opus 4.7 | $15.00 | $75.00 | $1.50 | 200k |
| Claude Sonnet 4.6 Popular | $3.00 | $15.00 | $0.30 | 200k |
| Claude Haiku 4.5 | $0.80 | $4.00 | $0.08 | 200k |
| OpenAI | ||||
| GPT-4o Popular | $2.50 | $10.00 | $1.25 | 128k |
| GPT-4o-mini Budget | $0.15 | $0.60 | $0.075 | 128k |
| Gemini 1.5 Pro | $1.25 | $5.00 | $0.3125 | 1M+ |
| Gemini 2.0 Flash Cheapest | $0.10 | $0.40 | $0.025 | 1M+ |
Cache read price shown (write is typically 1.25x input). Prices are approximate — verify at provider sites before billing decisions.
Anthropic ↗
OpenAI ↗
Google ↗
Frequently Asked Questions
How does LLM API pricing work?
All major LLMs charge per million tokens (MTok) — separately for input (your prompt) and output (the response). Claude Sonnet 4.6 costs $3/MTok input, GPT-4o costs $2.50/MTok, and Gemini 2.0 Flash costs just $0.10/MTok. Output tokens typically cost 4–5× more than input tokens. This tool estimates per-request and monthly costs based on your prompt length and usage volume.
Which LLM API is cheapest?
Gemini 2.0 Flash is currently the cheapest at $0.10/MTok input. GPT-4o-mini ($0.15/MTok) and Claude Haiku 4.5 ($0.80/MTok) are also very affordable. For the best balance of cost and quality, Claude Sonnet 4.6 ($3/MTok) and GPT-4o ($2.50/MTok) are popular. Use this calculator to see exact costs for your specific prompt.
Claude vs GPT-4o — which is cheaper?
GPT-4o ($2.50/MTok input) is slightly cheaper than Claude Sonnet 4.6 ($3.00/MTok) at the base level. However, Claude's prompt caching gives a 90% discount on cached input tokens — making Claude significantly cheaper for apps with large, repeated system prompts. For pure cost, Gemini 2.0 Flash beats both at $0.10/MTok input.
What is Claude prompt caching and how much does it save?
Prompt caching lets Claude reuse a stored version of your system prompt. The first request incurs a cache write surcharge (1.25× input price). Every subsequent request within the TTL window uses cache read pricing — just 10% of normal input cost, a 90% discount. For high-volume apps with large system prompts, this cuts input costs by 80–90%. OpenAI also offers prompt caching at 50% off for GPT-4o. See Anthropic's prompt caching docs.
How accurate is the token count?
This tool uses a 4 characters-per-token approximation — the standard heuristic most developers use for rough estimates. Actual tokenization may vary 5–10% depending on content type and the specific model's tokenizer. English prose is close to 4 chars/token; code and non-Latin scripts may differ. For precise Claude counts, use the Anthropic tokenizer.
How do I reduce my LLM API bill?
1. Enable prompt caching — Claude saves 90% on repeated system prompts; GPT-4o saves 50%.
2. Downgrade model for simple tasks — Gemini 2.0 Flash is 150× cheaper than Claude Opus per input token.
3. Shorten your system prompt — every uncached token costs money on every request.
4. Limit output length — use max_tokens to cap response length where appropriate.
5. Use Batch API — Anthropic's Message Batches API offers 50% off for async workloads.
2. Downgrade model for simple tasks — Gemini 2.0 Flash is 150× cheaper than Claude Opus per input token.
3. Shorten your system prompt — every uncached token costs money on every request.
4. Limit output length — use max_tokens to cap response length where appropriate.
5. Use Batch API — Anthropic's Message Batches API offers 50% off for async workloads.