๐ Table of Contents
AI model pricing changes fast. New models launch monthly, prices drop, old models get deprecated. This guide cuts through the noise with verified, side-by-side pricing for every major LLM provider as of March 2026 โ so you can pick the right model for your budget and use case.
๐ก Pricing moves fast. All prices on this page are sourced from official provider pricing pages and verified as of March 21, 2026. We update this guide regularly. Bookmark it.
TL;DR โ Quick Comparison Table
All prices per 1 million tokens. Sorted by input cost, cheapest first.
| Model | Provider | Input / 1M | Output / 1M | Cached Input | Context |
|---|---|---|---|---|---|
| DeepSeek V3.2 | DeepSeek | $0.28 | $0.42 | $0.028 | 128K |
| GPT-5.4 nano | OpenAI | $0.20 | $1.25 | $0.02 | 270K |
| Gemini 2.5 Flash | $0.30 | $2.50 | $0.03 | 1M | |
| Gemini 3 Flash | $0.50 | $3.00 | $0.05 | 1M | |
| Mistral Small 3.1 | Mistral | $0.10 | $0.30 | โ | 128K |
| GPT-5.4 mini | OpenAI | $0.75 | $4.50 | $0.075 | 270K |
| Llama 4 Maverick | $0.20 | $0.60 | varies | 1M | |
| Claude Haiku 4.5 | Anthropic | $1.00 | $5.00 | $0.10 | 200K |
| Gemini 2.5 Pro | $1.25 | $10.00 | $0.13 | 1M | |
| Gemini 3 Pro | $2.00 | $12.00 | $0.20 | 1M | |
| Magistral Medium | Mistral | $2.00 | $6.00 | โ | 40K |
| GPT-5.4 | OpenAI | $2.50 | $15.00 | $0.25 | 270K |
| Claude Sonnet 4.6 | Anthropic | $3.00 | $15.00 | $0.30 | 1M |
| Claude Opus 4.6 | Anthropic | $5.00 | $25.00 | $0.50 | 1M |
๐ Cheapest overall: DeepSeek V3.2 at $0.28 input / $0.42 output โ nearly 10ร cheaper than frontier models from OpenAI and Anthropic. But cheapest โ best. Read on.
Don't want to do the math?
Use our free AI Model Cost Calculator to estimate your monthly costs based on actual usage patterns.
Try the Calculator โOpenAI Pricing Breakdown
OpenAI's current lineup centers on the GPT-5.4 family, which replaced GPT-4o and o-series models. The three tiers cover everything from lightweight classification to frontier-class reasoning.
| Model | Input / 1M | Cached Input | Output / 1M | Best For |
|---|---|---|---|---|
| GPT-5.4 | $2.50 | $0.25 | $15.00 | Complex reasoning, professional work |
| GPT-5.4 mini | $0.75 | $0.075 | $4.50 | Coding, agents, sub-tasks |
| GPT-5.4 nano | $0.20 | $0.02 | $1.25 | High-volume, simple tasks |
Key Details
- Context window: All GPT-5.4 models support up to 270K tokens at standard pricing.
- Batch API: 50% discount on both input and output when using the asynchronous Batch API (24-hour turnaround).
- Flex processing: Lower costs for non-production workloads in exchange for slower response times.
- Caching: 90% discount on cached inputs across all models. Automatic โ no configuration needed.
- Data residency: Regional processing endpoints add 10% for models released after March 5, 2026.
โ ๏ธ Legacy note: GPT-4o, o1, o3, and o3-mini have been deprecated in favor of the unified GPT-5.4 family. If you're still on older models, migration is recommended โ the new models are both cheaper and more capable.
Anthropic (Claude) Pricing Breakdown
Anthropic's current generation is the Claude 4.6 series (Opus and Sonnet) plus Claude Haiku 4.5 as the fast/cheap option. Claude is known for strong coding, long-context, and instruction following.
| Model | Input / 1M | Cached Input | Output / 1M | Context | Max Output |
|---|---|---|---|---|---|
| Claude Opus 4.6 | $5.00 | $0.50 | $25.00 | 1M | 128K |
| Claude Sonnet 4.6 | $3.00 | $0.30 | $15.00 | 1M | 64K |
| Claude Haiku 4.5 | $1.00 | $0.10 | $5.00 | 200K | 64K |
Key Details
- Extended thinking: Available on all current models. Thinking tokens are billed at output rates.
- Prompt caching: 90% discount on cached inputs. Cache writes cost 25% more than base input price. Minimum 1,024 tokens to cache.
- Batch API: 50% discount on both input and output tokens. Results within 24 hours.
- 1M context: Opus 4.6 and Sonnet 4.6 support 1M tokens natively. Haiku 4.5 tops out at 200K.
- Available on: Anthropic API, AWS Bedrock, Google Vertex AI.
Legacy Models Still Available
| Model | Input / 1M | Output / 1M | Status |
|---|---|---|---|
| Claude Sonnet 4.5 | $3.00 | $15.00 | Active (legacy) |
| Claude Opus 4.5 | $5.00 | $25.00 | Active (legacy) |
| Claude Sonnet 4 | $3.00 | $15.00 | Active (legacy) |
| Claude Opus 4 | $15.00 | $75.00 | Active (legacy) |
๐ก Pro tip: Claude Opus 4 and 4.1 at $15/$75 are 3ร more expensive than Opus 4.6 at $5/$25. If you're still on older Opus models, upgrading saves money and gets you a better model.
Google (Gemini) Pricing Breakdown
Google runs two current generations: the new Gemini 3 series (preview) and the stable Gemini 2.5 series. Pricing is through Google AI Studio (consumer) or Vertex AI (enterprise). Prices below are Vertex AI standard tier.
| Model | Input / 1M | Cached Input | Output / 1M | Context |
|---|---|---|---|---|
| Gemini 3.1 Pro (preview) | $2.00 | $0.20 | $12.00 | 1M+ |
| Gemini 3 Pro (preview) | $2.00 | $0.20 | $12.00 | 1M+ |
| Gemini 3 Flash (preview) | $0.50 | $0.05 | $3.00 | 1M+ |
| Gemini 3.1 Flash-Lite (preview) | $0.25 | $0.03 | $1.50 | 1M+ |
| Gemini 2.5 Pro | $1.25 | $0.13 | $10.00 | 1M |
| Gemini 2.5 Flash | $0.30 | $0.03 | $2.50 | 1M |
Key Details
- Long context surcharge: Inputs over 200K tokens are charged at 2ร the base rate for Pro models.
- Batch/Flex pricing: 50% discount on both input and output for asynchronous batch processing.
- Priority tier: 80% premium for guaranteed high-speed processing.
- Free tier: Google AI Studio offers generous free usage for Gemini 2.5 Flash (rate-limited). Great for prototyping.
- Grounding: Web search grounding is $14 per 1,000 queries after 5,000 free/month.
๐ Note: Gemini 2.5 Flash at $0.30/$2.50 is one of the best value-for-money models available. Huge 1M context window, strong reasoning, and output includes thinking tokens at the same rate.
DeepSeek Pricing Breakdown
DeepSeek continues to be the price disruptor of the LLM market. Their V3.2 model unifies both chat and reasoning under a single endpoint, with the cheapest per-token pricing of any frontier-class model.
| Model | Input / 1M | Cached Input | Output / 1M | Context |
|---|---|---|---|---|
| DeepSeek V3.2 (deepseek-chat) | $0.28 | $0.028 | $0.42 | 128K |
| DeepSeek V3.2 Reasoning (deepseek-reasoner) | $0.28 | $0.028 | $0.42 | 128K |
Key Details
- Same pricing: Both chat and reasoning modes cost exactly the same โ $0.28/$0.42.
- Cache discount: 90% off for cache hits, matching OpenAI and Anthropic's caching ratios.
- Reasoning mode: Supports up to 64K output tokens; chat mode defaults to 4K (max 8K).
- FIM completion: Fill-in-the-middle available for code editing (chat mode only).
- Availability: DeepSeek API can have availability issues during peak demand. Consider third-party hosts for production.
โ ๏ธ Caveat: DeepSeek's pricing is unbeatable, but consider latency, rate limits, and reliability for production workloads. The API can be slow or unavailable during peak hours. Many teams use DeepSeek for batch/offline work and a more reliable provider for real-time.
Mistral Pricing Breakdown
Mistral has rebranded around their Magistral (reasoning), Mistral Small (general), and Devstral (code) families. They offer competitive pricing, especially at the small model tier.
| Model | Input / 1M | Output / 1M | Context | Best For |
|---|---|---|---|---|
| Magistral Medium 1.2 | $2.00 | $6.00 | 40K | Complex reasoning |
| Magistral Small | $0.50 | $1.50 | 40K | Lightweight reasoning |
| Mistral Small 3.1 | $0.10 | $0.30 | 128K | General purpose, agents |
| Devstral Small 2 | $0.50 | $1.50 | 128K | Code generation, agents |
| Ministral 8B | $0.10 | $0.10 | 128K | Edge, classification |
| Ministral 3B | $0.04 | $0.04 | 128K | Ultra-cheap, simple tasks |
Key Details
- Mistral Small 3.1: At $0.10/$0.30, this is one of the cheapest capable models available. Open weights (Apache 2.0), so you can self-host too.
- Ministral family: Purpose-built for edge and embedded use cases. Ministral 3B at $0.04/1M tokens is effectively free for most use cases.
- Batch API: Available on all models for async processing at reduced rates.
- Open source: Many Mistral models are open-weight, meaning you can self-host to avoid per-token costs entirely.
Open Source Models (Llama & Others)
Open-source models don't have a single "official" price โ it depends on your hosting provider. Here's what major inference providers charge for Meta's Llama 4 and other popular open models:
| Model | Provider | Input / 1M | Output / 1M |
|---|---|---|---|
| Llama 4 Maverick (400B MoE) | Together | $0.20 | $0.60 |
| Llama 4 Maverick (400B MoE) | Fireworks | $0.22 | $0.88 |
| Llama 4 Scout (109B MoE) | Together | $0.10 | $0.30 |
| Llama 4 Scout (109B MoE) | Fireworks | $0.15 | $0.60 |
| Llama 3.3 70B | Together | $0.10 | $0.30 |
| DeepSeek R1 (hosted) | Together | $0.17 | $0.51 |
| Qwen 3 235B MoE | Fireworks | $0.20 | $0.60 |
Self-Hosting vs. Inference APIs
- Inference APIs (Together, Fireworks, Groq, etc.): Pay per token. No infrastructure management. Best for variable or moderate workloads.
- Self-hosting (vLLM, TGI, Ollama): Pay for compute (GPU rental). Cost-effective at high volumes โ a single A100 running Llama 3.3 70B can serve thousands of requests/hour for ~$2/hr.
- Break-even point: Self-hosting typically beats API pricing at roughly 50M+ tokens/day for a 70B model.
๐ก Tip: Providers like Groq offer extremely fast inference (LPU hardware) with competitive pricing. If latency matters more than cost, check their current rates for Llama and Mistral models.
Cost Optimization Tips
The model you choose is only half the equation. How you use it matters just as much. These strategies can cut your LLM costs by 50โ90%:
1. Prompt Caching
Every major provider now offers automatic prompt caching. If your requests share a common system prompt or context prefix, cached tokens are 90% cheaper. This is the single biggest cost lever for most applications.
Example: Chatbot with 4K system prompt
Without caching: 4,000 tokens ร $3.00/1M = $0.012 per request
With caching: 4,000 tokens ร $0.30/1M = $0.0012 per request
Savings: 90% on system prompt tokens โ adds up fast at scale.
2. Model Routing
Don't use a $15/1M output model for tasks a $0.30/1M model can handle. Route requests to the cheapest capable model for each task:
- Classification, extraction, simple Q&A โ GPT-5.4 nano, Mistral Small, or Gemini Flash
- Code generation, analysis, drafting โ Claude Sonnet 4.6, GPT-5.4 mini, or Gemini 2.5 Pro
- Complex reasoning, research, novel problems โ Claude Opus 4.6, GPT-5.4, or Gemini 3 Pro
3. Batch Processing
If you don't need real-time responses, use batch APIs. OpenAI, Anthropic, and Google all offer 50% off for async batch jobs:
- OpenAI Batch API โ 50% discount, 24-hour window
- Anthropic Message Batches โ 50% discount, 24-hour window
- Google Vertex Batch/Flex โ 50% discount
4. Prompt Engineering
- Be concise: Every token costs money. Remove filler from prompts.
- Use structured output: JSON mode prevents rambling responses โ fewer output tokens.
- Set max_tokens: Cap output length to prevent runaway generation.
- Compress context: Summarize conversation history instead of sending full transcripts.
5. Avoid Over-Thinking
Models with reasoning/thinking (Claude extended thinking, Gemini's thinking, GPT-5.4 reasoning) produce internal reasoning tokens billed at output rates. For simple tasks, disable thinking mode to avoid paying for unnecessary chain-of-thought tokens.
See exactly how much you'll spend
Plug in your actual usage numbers and compare costs across all providers side by side.
Open the Cost Calculator โWhich Model Should You Use?
There's no single "best" model. It depends on what you're building, how much you're willing to spend, and what trade-offs matter. Here's a decision framework:
| Use Case | Budget Pick | Balanced Pick | Premium Pick |
|---|---|---|---|
| ๐ฌ Chatbot / Customer Support | Gemini 2.5 Flash $0.30/$2.50 |
Claude Haiku 4.5 $1.00/$5.00 |
Claude Sonnet 4.6 $3.00/$15.00 |
| ๐งโ๐ป Code Generation | DeepSeek V3.2 $0.28/$0.42 |
Claude Sonnet 4.6 $3.00/$15.00 |
Claude Opus 4.6 $5.00/$25.00 |
| ๐ Data Extraction / Classification | Mistral Small 3.1 $0.10/$0.30 |
GPT-5.4 nano $0.20/$1.25 |
GPT-5.4 mini $0.75/$4.50 |
| ๐ฌ Research / Complex Reasoning | DeepSeek V3.2 Reasoner $0.28/$0.42 |
Gemini 2.5 Pro $1.25/$10.00 |
Claude Opus 4.6 $5.00/$25.00 |
| ๐ Long Document Processing | Gemini 2.5 Flash 1M context |
Claude Sonnet 4.6 1M context |
Gemini 3 Pro 1M+ context |
| ๐ค Agentic / Multi-Step | Llama 4 Scout $0.10/$0.30 |
GPT-5.4 mini $0.75/$4.50 |
GPT-5.4 $2.50/$15.00 |
| ๐ฑ Edge / On-Device | Ministral 3B $0.04/$0.04 |
Mistral Small 3.1 $0.10/$0.30 |
Llama 4 Scout self-host |
| ๐ฐ Maximum Savings (Batch) | DeepSeek V3.2 $0.28/$0.42 |
Gemini 2.5 Flash Batch $0.15/$1.25 |
Claude Sonnet Batch $1.50/$7.50 |
Decision Cheat Sheet
๐ธ "I want the cheapest possible" โ DeepSeek V3.2 or Mistral Small 3.1
โก "I need the fastest" โ Gemini 2.5 Flash or Groq-hosted Llama
๐ง "I need the smartest" โ Claude Opus 4.6 or GPT-5.4
๐ "I have huge documents" โ Gemini (1M native) or Claude Sonnet/Opus (1M)
๐ "I need data privacy" โ Self-host Llama 4 or Mistral Small (open weights)
โ๏ธ "Best bang for buck" โ Claude Sonnet 4.6 or Gemini 2.5 Pro โ strong quality at mid-range prices
Interactive AI Cost Calculator
Comparing prices in a table only gets you so far. Real costs depend on your specific usage patterns: how many requests per day, average prompt size, output length, caching hit rate, and batch vs. real-time split.
๐ฆ AI Model Cost Calculator
Enter your usage. Get instant cost estimates across every provider.
Free. No signup. No tracking.
Use the Calculator โMethodology & Sources
All pricing data is sourced directly from official provider pricing pages and API documentation. Prices are for standard (pay-as-you-go) tier unless otherwise noted. Prices may vary by region, commitment level, or cloud marketplace.
- OpenAI โ openai.com/api/pricing
- Anthropic โ docs.anthropic.com
- Google โ cloud.google.com/vertex-ai
- DeepSeek โ api-docs.deepseek.com
- Mistral โ docs.mistral.ai
Last verified: March 21, 2026. Prices can change without notice. Always check the official pricing page before making decisions. Open-source model hosting prices are approximate and vary by provider, plan, and GPU availability.