⚡ Fastest LLM Inference

Groq API Pricing 2026

Complete pricing for Groq Cloud's LPU-accelerated LLaMA and Mixtral inference — plus head-to-head comparisons with Claude, GPT-4o, and Mistral.

Groq Model Pricing Table

Groq bills input and output tokens separately, in millions of tokens (MTok).

ModelInput (MTok)Output (MTok)ContextSpeed (tok/s)
LLaMA 3.1 8B$0.05$0.08128K~800
LLaMA 3.2 11B Vision$0.18$0.18128K~600
LLaMA 3.1 70B$0.59$0.79128K~330
LLaMA 3.3 70B$0.59$0.99128K~300
Mixtral 8x7B$0.24$0.2432K~480
Gemma 2 9B$0.20$0.208K~500
LLaMA 3.1 405B$2.99$2.99128K~100

Groq vs Claude vs GPT-4o vs Mistral

Full market comparison across the entire LLM pricing spectrum.

ModelProviderInput (MTok)Output (MTok)Model Type
LLaMA 3.1 8BGroq$0.05$0.08Open-weight
GPT-4o-miniOpenAI$0.15$0.60Frontier (small)
Claude Haiku 4.5Anthropic$0.25$1.25Frontier (small)
LLaMA 3.1 70BGroq$0.59$0.79Open-weight
Mistral LargeMistral$2.00$6.00Frontier (mid)
GPT-4oOpenAI$2.50$10.00Frontier
Claude Sonnet 4.6Anthropic$3.00$15.00Frontier
Claude Opus 4.7Anthropic$15.00$75.00Frontier (max)
Speed vs quality trade-off: Groq is 5–10× faster and 3–10× cheaper than Claude or GPT-4o. But it only serves open-weight models (LLaMA, Mixtral, Gemma). On complex coding, reasoning, and multi-step agent tasks, Claude Sonnet reliably outperforms LLaMA 70B. Run your own benchmark on your specific task before choosing.

Groq's Key Differentiator: Raw Speed

Token Generation Speed

500–800 tok/s

LLaMA 3.1 8B on Groq generates 10× faster than typical GPU-based providers, enabling real-time voice and streaming use cases.

Time to First Token

<100ms

Groq's LPU architecture delivers near-instant time-to-first-token — critical for user-facing streaming chat interfaces.

Cost at Scale

$0.05/MTok

LLaMA 8B at $0.05/MTok is among the cheapest available inference for non-trivial models. 100M tokens costs just $5.

Free Tier

14,400 req/day

Groq's free tier is one of the most generous in the industry — 14,400 requests/day across all models. No credit card needed.

When Should You Use Groq vs Claude?

Use CaseBest ChoiceWhy
Real-time voice / audio transcriptionGroqGroq Whisper at $0.111/hr, ultra-low latency
Streaming chatbot (cost-sensitive)Groq (LLaMA 70B)5× cheaper, 5× faster than Claude Sonnet
Complex reasoning / coding agentClaude Sonnet/OpusFrontier model quality matters here
Bulk classification / labelingGroq (LLaMA 8B)$0.05/MTok, high throughput
Multi-step agentic pipelineClaude SonnetBetter tool use, instruction following
RAG over short documentsGroq (Mixtral 8x7B)32K context, fast, cheap
Legal / medical analysisClaude OpusAccuracy stakes too high for open-weight
Prototyping / developmentGroq (free tier)14,400 req/day free, no credit card

Monthly Cost at 1 Billion Tokens

ModelInput Cost (500M tok)Output Cost (500M tok)Total/Month
Groq LLaMA 3.1 8B$25$40$65
Groq Mixtral 8x7B$120$120$240
Groq LLaMA 3.1 70B$295$395$690
Claude Haiku (cached)$12.50 cached$625~$638
Claude Sonnet (standard)$1,500$7,500$9,000
GPT-4o$1,250$5,000$6,250

Frequently Asked Questions

Is Groq the same as Grok (xAI)?

No. Groq (groq.com) is an AI infrastructure company that builds custom LPU chips for fast LLM inference. Grok (x.ai) is a large language model developed by Elon Musk's xAI. They are completely different companies. Groq serves LLaMA/Mixtral/Gemma models. xAI's Grok model is available via the X (Twitter) API separately.

Does Groq support function calling / tool use?

Yes. LLaMA 3.1 70B, LLaMA 3.3 70B, and Mixtral 8x7B on Groq support function calling (tool use). The implementation follows the OpenAI function calling schema, making it easy to switch from OpenAI to Groq with minimal code changes. Quality and reliability of function calling is generally lower than Claude's tool use.

What are Groq's rate limits?

Free tier: 30 RPM, 14,400 requests/day per model. Paid tier rate limits scale with usage spend. Groq's free tier is among the most generous of any LLM provider — ideal for prototyping and low-volume production. For enterprise volume (>1M requests/day), contact Groq for dedicated capacity.

Calculate the exact cost for your specific token volume across Groq, Claude, and GPT-4o.

Open Pricing Calculator →