Claude Prompt Caching Cost Calculator

Prompt caching saves 90% on repeated context. See exact pricing across all Claude 4 models and calculate your real-world savings.

Caching Pricing Rates

Cache writes occur once per 5-minute TTL window. All subsequent requests within that window read from cache at 10% of standard input cost.

Model Standard Input Cache Write (1.25×) Cache Read (0.10×) Cache Savings Min Cache Size
Claude Haiku 4.5 $0.80/MTok $1.00/MTok $0.08/MTok 90% 2,048 tokens
Claude Sonnet 4.6 $3.00/MTok $3.75/MTok $0.30/MTok 90% 1,024 tokens
Claude Opus 4.7 $15.00/MTok $18.75/MTok $1.50/MTok 90% 1,024 tokens

Real-World Savings: Chatbot Example

A chatbot with a 2,000-token system prompt, 300-token average user message, 200-token average response, 10,000 conversations/day, 3 turns per conversation:

Claude Haiku 4.5 — Chatbot (10K conversations/day)

Without Caching
$20.80
/day
With Caching
$3.68
/day
Daily Savings
$17.12
82% off
Annual Savings
$6,249
/year

Claude Sonnet 4.6 — Chatbot (10K conversations/day)

Without Caching
$78.00
/day
With Caching
$13.80
/day
Daily Savings
$64.20
82% off
Annual Savings
$23,433
/year

RAG Pipeline Savings Example

A document Q&A app that loads a 20,000-token document context per session, 500 questions/day, 10 turns per session:

Model Without Caching/day With Caching/day Savings/day Savings/year
Haiku 4.5 $81.60 $12.08 $69.52 (85%) $25,375
Sonnet 4.6 $306.00 $45.30 $260.70 (85%) $95,156

How to Enable Prompt Caching

Add cache_control to the last content block you want cached. You can have up to 4 cache breakpoints per request.

Python (system prompt caching)

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system=[{
        "type": "text",
        "text": "You are a helpful assistant...",  # your system prompt
        "cache_control": {"type": "ephemeral"}  # ← add this
    }],
    messages=[{
        "role": "user",
        "content": user_message
    }]
)

# Check cache hit in response
print(response.usage)
# Usage(cache_creation_input_tokens=1247, cache_read_input_tokens=0, ...)
# Second call: cache_read_input_tokens=1247 (90% cheaper)

Document + system prompt caching (RAG)

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system=[{
        "type": "text",
        "text": "Answer questions based on the document below.",
        "cache_control": {"type": "ephemeral"}
    }],
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": document_content,  # large document
                    "cache_control": {"type": "ephemeral"}  # ← cache the doc
                },
                {"type": "text", "text": user_question}
            ]
        }
    ]
)
The cache TTL is 5 minutes. For chatbots with active sessions, sessions staying within a 5-minute window will read from cache. For longer sessions or batch pipelines, structure your requests to keep the cached prefix constant and append variable content after the cache breakpoint.

Caching vs No-Caching Breakeven

Caching is always beneficial when you send the same context more than once within 5 minutes. The write cost is recovered on the second request:

Scenario Cached Tokens Haiku Breakeven Sonnet Breakeven
Small system prompt (500 tokens)500 tokens2nd request2nd request
Large system prompt (2K tokens)2,000 tokens2nd request2nd request
Document context (20K tokens)20,000 tokens2nd request2nd request
Full book (100K tokens)100,000 tokens2nd request2nd request

Cache writes cost 125% (1.25×) of standard input. Cache reads cost 10% (0.10×). You break even on the 2nd read — the write overhead of 0.25× is recouped immediately since reads save 90%. Starting from the 2nd request, every cache hit gives you 90% off that token segment.

Calculate exact costs for your prompt

Paste your system prompt into the calculator to see token counts, standard vs cached costs, and monthly savings estimates.

Frequently Asked Questions

How much does Claude prompt caching cost?

Cache writes: 125% of standard input rate (e.g., $1.00/MTok for Haiku, $3.75/MTok for Sonnet). Cache reads: 10% of standard input rate (e.g., $0.08/MTok for Haiku, $0.30/MTok for Sonnet). Output tokens are unchanged. From the 2nd request within the TTL window, every cached token costs 90% less.

What is the minimum context size for Claude caching?

Minimum cacheable block: 2,048 tokens for Haiku 4.5, 1,024 tokens for Sonnet 4.6 and Opus 4.7. Content shorter than the minimum is not cached even with cache_control headers present. Most system prompts and document contexts exceed this threshold.

How do I enable prompt caching in the Claude API?

Add "cache_control": {"type": "ephemeral"} to the content block you want to cache. Typically: the last block of your system prompt, or a large document in the messages array. You can have up to 4 cache breakpoints. Cache TTL is 5 minutes.

Does prompt caching work with all Claude 4 models?

Yes — all Claude 4 models (Haiku 4.5, Sonnet 4.6, Opus 4.7) support prompt caching. Rates and minimum sizes differ slightly. The 90% cache read discount applies to all models. Cache is per-API-key and per-model.

How is Claude caching different from OpenAI's cached inputs?

Claude uses explicit cache_control headers (you choose what to cache); OpenAI's caching is automatic. Claude cache reads cost 10% of standard (90% discount); OpenAI cached inputs cost 50%. Claude charges for cache writes (1.25×); OpenAI does not. For large fixed contexts, Claude's 90% discount typically outperforms OpenAI's 50%.