OpenAI o3 API Pricing 2026

o3 at $10/MTok input — the frontier reasoning model. Full cost comparison vs Claude Opus 4.7, o4-mini, and Gemini. Plus: when o3 actually justifies its price premium.

OpenAI o3 Pricing — All Tiers

Model Input (per 1M tokens) Output (per 1M tokens) Cache Read Notes
o3 Frontier reasoning $10.00 $40.00 $2.50 (75% off) Best-in-class math, science, code reasoning
o4-mini Best value reasoning $1.10 $4.40 $0.275 (75% off) ~9× cheaper than o3, handles most hard tasks
o1 $15.00 $60.00 $7.50 (50% off) Legacy reasoning model; o3 supersedes it

o3 vs Claude Opus vs Gemini Ultra — Full Comparison

Model Input Output Cache Read Context Extended Thinking
o4-mini Cheapest reasoning $1.10 $4.40 $0.275 128k tokens Yes (counted in output)
Gemini 2.0 Flash Thinking $0.10 $3.50 $0.025 1M tokens Yes (built-in)
Claude Sonnet 4.6 $3.00 $15.00 $0.30 (90% off) 200k tokens Yes (extended thinking)
o3 This page $10.00 $40.00 $2.50 (75% off) 128k tokens Yes (built-in CoT)
Claude Opus 4.7 Premium $15.00 $75.00 $1.50 (90% off) 200k tokens Yes (extended thinking)

The caching gap matters at scale: Claude Sonnet with 90% caching costs $0.30/MTok for cache reads. o3 at 75% off costs $2.50/MTok. For apps sending a large repeated system prompt, Sonnet's caching advantage alone can make it cheaper than o3 — even though o3's sticker price is only 3× higher. Run the calculator with your actual prompt to see real numbers.

o3 Tradeoffs — When Does It Actually Win?

Hard math & science: o3 wins

o3 achieves near-human performance on AIME 2024 (American Math Olympiad) and outperforms Claude Opus on PhysicsBench and SWE-bench-verified. For problems where correctness on genuinely hard reasoning problems matters, o3's quality ceiling is higher.

Agentic workflows: Claude wins

Claude Opus and Sonnet handle multi-step tool use, complex instruction following, and long-horizon agent tasks more reliably. Claude's 200k context window (vs o3's 128k) is also an advantage for large-context agent tasks like full codebase analysis.

Cost at scale: o4-mini almost always wins

At 9× o3's price, o4-mini handles 80–90% of hard reasoning use cases adequately. Only choose o3 when you've confirmed o4-mini is genuinely quality-limited for your specific task — not just theoretically.

Reliability & ecosystem: OpenAI advantage

o3 is available through Azure OpenAI Service with enterprise SLAs, SOC 2, HIPAA BAA. Anthropic Claude is also available via AWS Bedrock and GCP Vertex AI with comparable compliance certifications. Both are production-grade.

When to Choose o3 vs Alternatives

Scenario Recommendation Reason
Math olympiad / competitive programming o3 Highest ceiling on genuinely hard formal reasoning; o4-mini may underperform
Hard reasoning, budget matters o4-mini 9× cheaper than o3; handles most reasoning tasks well — test this first
Multi-step agentic workflows Claude Sonnet 4.6 Better instruction following, 200k context, deeper caching discount, tool use reliability
Large repeated system prompt Claude Sonnet/Opus 90% cache discount (vs o3's 75%) makes Claude cheaper per effective token on cache-heavy workloads
Fast, cheap inference at scale Gemini Flash / Claude Haiku Both are <$1/MTok; use reasoning models only when quality demands it
Enterprise compliance required o3 or Claude Opus Both available with HIPAA, SOC 2, and Azure/AWS/GCP enterprise deployment options

Monthly Cost at Scale — o3 vs Claude Opus vs o4-mini

For a hard reasoning pipeline processing 10M input tokens/month with 5M output tokens:

Model Input Cost Output Cost Monthly Total Notes
o4-mini $11 $22 $33 Try this first before paying o3 rates
Claude Sonnet 4.6 (90% cache) $30 (blended) $75 $105 10M input, assume 90% cached at $0.30/MTok
o3 (no cache) $100 $200 $300 Sticker price with no caching
o3 (75% cache hit) $32.50 $200 $232.50 Blended input at 75% cache rate
Claude Opus 4.7 (90% cache) $16.50 $375 $391.50 Higher output cost offsets caching benefit

Calculate Your o3 vs Claude Cost

Paste your actual prompt to see exact token counts and compare o3, o4-mini, Claude Sonnet, and Claude Opus with realistic cache scenarios.

Open the LLM Pricing Calculator →

Frequently Asked Questions

How much does o3 cost per API call?

There's no fixed per-call cost — o3 charges per token consumed. A typical hard reasoning prompt of 2,000 tokens input with 1,500 tokens output (including thinking) costs approximately $0.02 input + $0.06 output = $0.08 per call. Heavy reasoning tasks with long thinking chains can produce 5,000–10,000 output tokens, pushing individual call cost to $0.20–$0.40. Use the calculator above with your actual prompt to get accurate estimates.

Is o3 available on Azure OpenAI?

Yes. o3 is available through Azure OpenAI Service with enterprise SLAs, data residency controls, HIPAA BAA, and SOC 2 compliance. Azure pricing matches OpenAI direct pricing for o3 ($10/$40 MTok input/output). If you need enterprise compliance, private deployment, or VNet integration, Azure OpenAI is the recommended path. Deployment is available in US, EU, and APAC Azure regions.

Does o3 support function calling and structured outputs?

Yes. o3 supports OpenAI's function calling (tools) and JSON mode for structured outputs. However, like all reasoning models, o3 may use additional thinking tokens before calling a function, increasing per-call cost compared to non-reasoning models like GPT-4o. If you only need structured extraction (no hard reasoning), GPT-4o or Claude Haiku at much lower cost are better choices. Use o3 for tool-use when the decision of *which* tool to call requires genuine multi-step reasoning.

What's the difference between o3 and o3-mini?

o3-mini was OpenAI's lightweight reasoning model that was later superseded by o4-mini. If you're seeing o3-mini pricing ($1.10/MTok input, $4.40/MTok output) referenced, those prices now apply to o4-mini, which is OpenAI's current recommended affordable reasoning model. o3 (without the -mini suffix) is the full frontier reasoning model at $10/MTok input. When evaluating costs, use o4-mini as the budget reasoning baseline, not o3-mini (which is effectively the same model, renamed).

Should I switch from Claude Opus to o3?

It depends on your workload. o3 is cheaper per token ($10 vs $15/MTok input) but lacks Claude Opus's 90% prompt caching discount (o3 only offers 75% off). For instruction-following-heavy agentic workflows, Claude Opus typically outperforms. For pure hard reasoning tasks with minimal repeated context, o3 is often better value. The practical answer: run A/B evals on your actual task and measure quality + cost together — don't switch based on benchmark scores alone.

Also see: OpenAI o4-mini Pricing · Claude Opus Pricing · Claude Sonnet Pricing · GPT-4o vs Claude Cost · LLM Cost Comparison 2026