50% OFF

Claude Batch API Pricing 2026

Anthropic's Message Batches API cuts all Claude model prices in half — for async workloads like data pipelines, labeling, and evaluation runs.

Batch API Pricing vs Standard API

Every Claude model is 50% cheaper via the Batch API. The trade-off: results within 24 hours rather than real-time.

ModelStandard InputBatch InputStandard OutputBatch OutputSavings
Claude Haiku 4.5 $0.25/MTok $0.125/MTok $1.25/MTok $0.625/MTok 50% off
Claude Sonnet 4.6 $3.00/MTok $1.50/MTok $15.00/MTok $7.50/MTok 50% off
Claude Opus 4.7 $15.00/MTok $7.50/MTok $75.00/MTok $37.50/MTok 50% off
Stack with prompt caching: Batch API and prompt caching are independent discounts you can use together. A cached read on a Sonnet batch request costs $0.30 × 50% = $0.15/MTok — 95% cheaper than the standard $3.00/MTok input rate. For document processing pipelines with a shared system prompt, combining both is the maximum cost reduction available from Anthropic.

When to Use Batch API vs Standard API

✅ Use Batch API When...

  • Task is not user-facing
  • Can tolerate <24h completion
  • Processing large datasets
  • Running evaluations/CI
  • Bulk document classification
  • Data labeling pipelines
  • Content moderation at scale
  • Overnight analytics jobs

→ Use Standard API When...

  • Human waiting for response
  • Latency < 30 seconds needed
  • Building a chatbot or assistant
  • Streaming output required
  • Real-time code assistance
  • Interactive applications
  • Low-latency agentic loops
  • Production user-facing API

Monthly Savings Calculator: Real Examples

Use CaseVolume/MonthStandard CostBatch CostMonthly Saving
Document classification (Haiku) 100M tokens $25 $12.50 $12.50
Data labeling pipeline (Sonnet) 1B tokens $18,000 $9,000 $9,000
Eval suite runs (Sonnet) 500M tokens $9,000 $4,500 $4,500
Research analysis (Opus) 100M tokens $9,000 $4,500 $4,500

How to Use the Batch API (Python)

import anthropic client = anthropic.Anthropic() # Create a batch of up to 100,000 requests batch = client.messages.batches.create( requests=[ { "custom_id": f"doc-{i}", "params": { "model": "claude-sonnet-4-6", "max_tokens": 256, "messages": [ {"role": "user", "content": f"Classify this document: {doc}"} ] } } for i, doc in enumerate(documents) # up to 100,000 docs ] ) print(f"Batch ID: {batch.id}") # poll this until processing_status = 'ended'

Batch API vs Prompt Caching — Which Saves More?

OptimizationDiscountBest ForLatency Impact
Prompt caching90% on cache readsRepeated system prompts / contextNone (real-time)
Batch API50% on everythingAny async/offline workloadUp to 24 hours
Both combinedUp to 95% on cached inputLarge-batch pipelines with shared contextUp to 24 hours

Frequently Asked Questions

What is the maximum batch size?

Each batch can contain up to 100,000 individual requests. Each request supports the full model context window (up to 200K tokens). Typical batch completion time ranges from a few minutes (small batches) to a few hours (large batches close to 100K requests).

Is Batch API available in all regions?

The Batch API is available in all regions where the standard Claude API is available. Batch results are stored for 29 days after completion; download results before expiry.

Can I cancel a batch job?

Yes. Use the client.messages.batches.cancel(batch_id) method to cancel an in-progress batch. Requests already processed are billed; pending requests are not.

Calculate your exact cost savings from switching async workloads to Batch API.

Open Pricing Calculator →