Question 1

How much does Claude prompt caching cost?

Accepted Answer

Claude prompt caching has two components: cache writes and cache reads. Cache writes cost 125% of the standard input rate (e.g., $1.00/MTok for Haiku 4.5, $3.75/MTok for Sonnet 4.6). Cache reads cost 10% of the standard input rate — a 90% discount (e.g., $0.08/MTok for Haiku 4.5, $0.30/MTok for Sonnet 4.6). Cache writes happen once per 5-minute TTL window; subsequent requests within that window read from cache at the discounted rate. For any app that re-sends the same context (system prompt, documents) more than once per TTL, caching almost always saves money.

Question 2

What is the minimum context size that benefits from prompt caching?

Accepted Answer

The minimum cacheable block size is 1,024 tokens for Claude Sonnet and Opus models, and 2,048 tokens for Claude Haiku models. Content shorter than the minimum is not cached even if cache_control headers are present. In practice, most system prompts (instructions + persona) are 500–2,000 tokens, and most tool-use / RAG document contexts are 2,000–50,000 tokens — well above the minimum. Caching becomes most impactful when the cached segment is large relative to the per-request variable portion.

Question 3

How do I enable prompt caching in the Claude API?

Accepted Answer

Add {"cache_control": {"type": "ephemeral"}} to the content block you want to cache. Most commonly: the last content block of your system prompt, or the last content block of a large document in the messages array. The cache TTL is 5 minutes. Example: {"role": "system", "content": [{"type": "text", "text": "<your-system-prompt>", "cache_control": {"type": "ephemeral"}}]}. You can have up to 4 cache breakpoints per request.

Question 4

Does prompt caching work with all Claude 4 models?

Accepted Answer

Yes — prompt caching is supported on all Claude 4 models: Haiku 4.5, Sonnet 4.6, and Opus 4.7. The minimum cached token sizes differ: 2,048 tokens for Haiku, 1,024 tokens for Sonnet and Opus. Cache TTL is 5 minutes for all models. Cache write cost is 125% of the standard input rate; cache read cost is 10% of the standard input rate (90% savings). The cache is per-API-key and per-model — different models don't share cache entries.

Question 5

How is Claude prompt caching different from OpenAI's cached inputs?

Accepted Answer

Claude's prompt caching uses explicit cache_control headers (you choose what to cache), while OpenAI's caching is automatic (OpenAI decides what to cache based on prompt prefixes). Claude cache reads cost 10% of standard input rate (90% discount); OpenAI cached inputs cost 50% of standard input rate. Claude has a 5-minute TTL with a write charge; OpenAI's caching is free to write and automatic. For apps with explicit, large fixed contexts, Claude's 90% discount typically outperforms OpenAI's 50% discount.

Model	Standard Input	Cache Write (1.25×)	Cache Read (0.10×)	Cache Savings	Min Cache Size
Claude Haiku 4.5	$0.80/MTok	$1.00/MTok	$0.08/MTok	90%	2,048 tokens
Claude Sonnet 4.6	$3.00/MTok	$3.75/MTok	$0.30/MTok	90%	1,024 tokens
Claude Opus 4.7	$15.00/MTok	$18.75/MTok	$1.50/MTok	90%	1,024 tokens

Model	Without Caching/day	With Caching/day	Savings/day	Savings/year
Haiku 4.5	$81.60	$12.08	$69.52 (85%)	$25,375
Sonnet 4.6	$306.00	$45.30	$260.70 (85%)	$95,156

Scenario	Cached Tokens	Haiku Breakeven	Sonnet Breakeven
Small system prompt (500 tokens)	500 tokens	2nd request	2nd request
Large system prompt (2K tokens)	2,000 tokens	2nd request	2nd request
Document context (20K tokens)	20,000 tokens	2nd request	2nd request
Full book (100K tokens)	100,000 tokens	2nd request	2nd request

Claude Prompt Caching Cost Calculator

Caching Pricing Rates

Real-World Savings: Chatbot Example

Claude Haiku 4.5 — Chatbot (10K conversations/day)

Claude Sonnet 4.6 — Chatbot (10K conversations/day)

RAG Pipeline Savings Example

How to Enable Prompt Caching

Python (system prompt caching)

Document + system prompt caching (RAG)

Caching vs No-Caching Breakeven

Calculate exact costs for your prompt

Frequently Asked Questions

How much does Claude prompt caching cost?

What is the minimum context size for Claude caching?

How do I enable prompt caching in the Claude API?

Does prompt caching work with all Claude 4 models?

How is Claude caching different from OpenAI's cached inputs?