Question 1

How does LLM API pricing work?

Accepted Answer

All major LLM APIs charge per million tokens (MTok) — separately for input (your prompt) and output (the response). Claude Sonnet 4.6 costs $3/MTok input, GPT-4o costs $2.50/MTok, and Gemini 2.0 Flash costs just $0.10/MTok. Output tokens typically cost 4-5x more than input tokens. This tool estimates per-request and monthly costs based on your prompt length and usage volume.

Question 2

Which LLM API is cheapest?

Accepted Answer

Gemini 2.0 Flash is currently the cheapest major LLM API at $0.10/MTok input and $0.40/MTok output. Claude Haiku 4.5 ($0.80/$4.00) and GPT-4o-mini ($0.15/$0.60) are also very affordable. For the best balance of cost and capability, Claude Sonnet 4.6 ($3/$15) and GPT-4o ($2.50/$10) are popular choices. Use this calculator to see exact costs for your specific prompt and volume.

Question 3

Claude vs GPT-4o — which is cheaper?

Accepted Answer

GPT-4o ($2.50/MTok input) is slightly cheaper than Claude Sonnet 4.6 ($3.00/MTok input) at the base level. However, Claude's prompt caching gives a 90% discount on cached input tokens — making Claude significantly cheaper for apps with large, repeated system prompts. For pure cost, Gemini 2.0 Flash beats both at $0.10/MTok. The best model depends on your use case, latency requirements, and whether you can use prompt caching.

Question 4

What is prompt caching and how much does it save?

Accepted Answer

Prompt caching lets Claude reuse a stored version of your system prompt. The first request incurs a cache write surcharge (1.25x input price). Every subsequent request within the TTL window uses cache read pricing — just 10% of normal input cost, a 90% discount. For high-volume apps with large system prompts, this typically cuts input costs by 80-90%. OpenAI also offers prompt caching at 50% off for GPT-4o cached inputs.

Question 5

How accurate is the token count?

Accepted Answer

This tool uses a 4 characters-per-token approximation — the same heuristic most developers use for rough estimates. Actual tokenization may vary 5–10% depending on content type and the specific model's tokenizer. English prose is close to 4 chars/token; code and non-Latin scripts may differ. For precise Claude counts, use Anthropic's tokenizer tool.

Question 6

How do I reduce my LLM API bill?

Accepted Answer

1. Enable prompt caching — Claude saves 90% on repeated system prompts, GPT-4o saves 50%. 2. Downgrade model for simple tasks — Gemini 2.0 Flash is 150x cheaper than Claude Opus per input token. 3. Shorten your system prompt — every token costs money on every uncached request. 4. Limit output length — use max_tokens to cap response length. 5. Use Batch API — Anthropic's Message Batches API offers 50% off for async workloads.

Model	Input	Output	Cache Read	Context
Anthropic
Claude Opus 4.7	$15.00	$75.00	$1.50	200k
Claude Sonnet 4.6 Popular	$3.00	$15.00	$0.30	200k
Claude Haiku 4.5	$0.80	$4.00	$0.08	200k
OpenAI
GPT-4o Popular	$2.50	$10.00	$1.25	128k
GPT-4o-mini Budget	$0.15	$0.60	$0.075	128k
Google
Gemini 1.5 Pro	$1.25	$5.00	$0.3125	1M+
Gemini 2.0 Flash Cheapest	$0.10	$0.40	$0.025	1M+

LLM API Pricing Calculator

Frequently Asked Questions

FAQ