What is the cheapest AI model API in 2026?

DeepSeek V3.2 is the cheapest frontier-class AI model at $0.28/1M input tokens and $0.42/1M output tokens. For cached inputs, it drops to just $0.028/1M tokens. Google's Gemini 2.5 Flash is also extremely competitive at $0.30/$2.50 per 1M tokens.

How much does the GPT-4o API cost in 2026?

OpenAI has deprecated GPT-4o in favor of the GPT-5.4 series. GPT-5.4 costs $2.50/1M input tokens and $15.00/1M output tokens. The cheaper GPT-5.4 mini costs $0.75/$4.50, and GPT-5.4 nano costs $0.20/$1.25 per 1M tokens.

Is Claude or GPT cheaper?

It depends on the tier. At the frontier level, Claude Sonnet 4.6 ($3/$15) and GPT-5.4 ($2.50/$15) are similarly priced, with GPT having cheaper input tokens. For budget models, GPT-5.4 nano ($0.20/$1.25) is cheaper than Claude Haiku 4.5 ($1/$5).

AI Model Pricing Comparison 2026: GPT-4o vs Claude vs Gemini vs DeepSeek (Complete Guide)

📑 Table of Contents

TL;DR — Quick Comparison Table
OpenAI Pricing
Anthropic (Claude) Pricing
Google (Gemini) Pricing
DeepSeek Pricing
Mistral Pricing
Open Source Options
Cost Optimization Tips
Which Model Should You Use?
Interactive Calculator

AI model pricing changes fast. New models launch monthly, prices drop, old models get deprecated. This guide cuts through the noise with verified, side-by-side pricing for every major LLM provider as of March 2026 — so you can pick the right model for your budget and use case.

💡 Pricing moves fast. All prices on this page are sourced from official provider pricing pages and verified as of March 21, 2026. We update this guide regularly. Bookmark it.

TL;DR — Quick Comparison Table

All prices per 1 million tokens. Sorted by input cost, cheapest first.

Model	Provider	Input / 1M	Output / 1M	Cached Input	Context
DeepSeek V3.2	DeepSeek	$0.28	$0.42	$0.028	128K
GPT-5.4 nano	OpenAI	$0.20	$1.25	$0.02	270K
Gemini 2.5 Flash	Google	$0.30	$2.50	$0.03	1M
Gemini 3 Flash	Google	$0.50	$3.00	$0.05	1M
Mistral Small 3.1	Mistral	$0.10	$0.30	—	128K
GPT-5.4 mini	OpenAI	$0.75	$4.50	$0.075	270K
Llama 4 Maverick	Meta / Hosted	$0.20	$0.60	varies	1M
Claude Haiku 4.5	Anthropic	$1.00	$5.00	$0.10	200K
Gemini 2.5 Pro	Google	$1.25	$10.00	$0.13	1M
Gemini 3 Pro	Google	$2.00	$12.00	$0.20	1M
Magistral Medium	Mistral	$2.00	$6.00	—	40K
GPT-5.4	OpenAI	$2.50	$15.00	$0.25	270K
Claude Sonnet 4.6	Anthropic	$3.00	$15.00	$0.30	1M
Claude Opus 4.6	Anthropic	$5.00	$25.00	$0.50	1M

🏆 Cheapest overall: DeepSeek V3.2 at $0.28 input / $0.42 output — nearly 10× cheaper than frontier models from OpenAI and Anthropic. But cheapest ≠ best. Read on.

Don't want to do the math?

Use our free AI Model Cost Calculator to estimate your monthly costs based on actual usage patterns.

Try the Calculator →

OpenAI Pricing Breakdown

OpenAI's current lineup centers on the GPT-5.4 family, which replaced GPT-4o and o-series models. The three tiers cover everything from lightweight classification to frontier-class reasoning.

Model	Input / 1M	Cached Input	Output / 1M	Best For
GPT-5.4	$2.50	$0.25	$15.00	Complex reasoning, professional work
GPT-5.4 mini	$0.75	$0.075	$4.50	Coding, agents, sub-tasks
GPT-5.4 nano	$0.20	$0.02	$1.25	High-volume, simple tasks

Key Details

Context window: All GPT-5.4 models support up to 270K tokens at standard pricing.
Batch API: 50% discount on both input and output when using the asynchronous Batch API (24-hour turnaround).
Flex processing: Lower costs for non-production workloads in exchange for slower response times.
Caching: 90% discount on cached inputs across all models. Automatic — no configuration needed.
Data residency: Regional processing endpoints add 10% for models released after March 5, 2026.

⚠️ Legacy note: GPT-4o, o1, o3, and o3-mini have been deprecated in favor of the unified GPT-5.4 family. If you're still on older models, migration is recommended — the new models are both cheaper and more capable.

Anthropic (Claude) Pricing Breakdown

Anthropic's current generation is the Claude 4.6 series (Opus and Sonnet) plus Claude Haiku 4.5 as the fast/cheap option. Claude is known for strong coding, long-context, and instruction following.

Model	Input / 1M	Cached Input	Output / 1M	Context	Max Output
Claude Opus 4.6	$5.00	$0.50	$25.00	1M	128K
Claude Sonnet 4.6	$3.00	$0.30	$15.00	1M	64K
Claude Haiku 4.5	$1.00	$0.10	$5.00	200K	64K

Key Details

Extended thinking: Available on all current models. Thinking tokens are billed at output rates.
Prompt caching: 90% discount on cached inputs. Cache writes cost 25% more than base input price. Minimum 1,024 tokens to cache.
Batch API: 50% discount on both input and output tokens. Results within 24 hours.
1M context: Opus 4.6 and Sonnet 4.6 support 1M tokens natively. Haiku 4.5 tops out at 200K.
Available on: Anthropic API, AWS Bedrock, Google Vertex AI.

Legacy Models Still Available

Model	Input / 1M	Output / 1M	Status
Claude Sonnet 4.5	$3.00	$15.00	Active (legacy)
Claude Opus 4.5	$5.00	$25.00	Active (legacy)
Claude Sonnet 4	$3.00	$15.00	Active (legacy)
Claude Opus 4	$15.00	$75.00	Active (legacy)

💡 Pro tip: Claude Opus 4 and 4.1 at $15/$75 are 3× more expensive than Opus 4.6 at $5/$25. If you're still on older Opus models, upgrading saves money and gets you a better model.

Google (Gemini) Pricing Breakdown

Google runs two current generations: the new Gemini 3 series (preview) and the stable Gemini 2.5 series. Pricing is through Google AI Studio (consumer) or Vertex AI (enterprise). Prices below are Vertex AI standard tier.

Model	Input / 1M	Cached Input	Output / 1M	Context
Gemini 3.1 Pro (preview)	$2.00	$0.20	$12.00	1M+
Gemini 3 Pro (preview)	$2.00	$0.20	$12.00	1M+
Gemini 3 Flash (preview)	$0.50	$0.05	$3.00	1M+
Gemini 3.1 Flash-Lite (preview)	$0.25	$0.03	$1.50	1M+
Gemini 2.5 Pro	$1.25	$0.13	$10.00	1M
Gemini 2.5 Flash	$0.30	$0.03	$2.50	1M

Key Details

Long context surcharge: Inputs over 200K tokens are charged at 2× the base rate for Pro models.
Batch/Flex pricing: 50% discount on both input and output for asynchronous batch processing.
Priority tier: 80% premium for guaranteed high-speed processing.
Free tier: Google AI Studio offers generous free usage for Gemini 2.5 Flash (rate-limited). Great for prototyping.
Grounding: Web search grounding is $14 per 1,000 queries after 5,000 free/month.

🔍 Note: Gemini 2.5 Flash at $0.30/$2.50 is one of the best value-for-money models available. Huge 1M context window, strong reasoning, and output includes thinking tokens at the same rate.

DeepSeek Pricing Breakdown

DeepSeek continues to be the price disruptor of the LLM market. Their V3.2 model unifies both chat and reasoning under a single endpoint, with the cheapest per-token pricing of any frontier-class model.

Model	Input / 1M	Cached Input	Output / 1M	Context
DeepSeek V3.2 (deepseek-chat)	$0.28	$0.028	$0.42	128K
DeepSeek V3.2 Reasoning (deepseek-reasoner)	$0.28	$0.028	$0.42	128K

Key Details

Same pricing: Both chat and reasoning modes cost exactly the same — $0.28/$0.42.
Cache discount: 90% off for cache hits, matching OpenAI and Anthropic's caching ratios.
Reasoning mode: Supports up to 64K output tokens; chat mode defaults to 4K (max 8K).
FIM completion: Fill-in-the-middle available for code editing (chat mode only).
Availability: DeepSeek API can have availability issues during peak demand. Consider third-party hosts for production.

⚠️ Caveat: DeepSeek's pricing is unbeatable, but consider latency, rate limits, and reliability for production workloads. The API can be slow or unavailable during peak hours. Many teams use DeepSeek for batch/offline work and a more reliable provider for real-time.

Mistral Pricing Breakdown

Mistral has rebranded around their Magistral (reasoning), Mistral Small (general), and Devstral (code) families. They offer competitive pricing, especially at the small model tier.

Model	Input / 1M	Output / 1M	Context	Best For
Magistral Medium 1.2	$2.00	$6.00	40K	Complex reasoning
Magistral Small	$0.50	$1.50	40K	Lightweight reasoning
Mistral Small 3.1	$0.10	$0.30	128K	General purpose, agents
Devstral Small 2	$0.50	$1.50	128K	Code generation, agents
Ministral 8B	$0.10	$0.10	128K	Edge, classification
Ministral 3B	$0.04	$0.04	128K	Ultra-cheap, simple tasks

Key Details

Mistral Small 3.1: At $0.10/$0.30, this is one of the cheapest capable models available. Open weights (Apache 2.0), so you can self-host too.
Ministral family: Purpose-built for edge and embedded use cases. Ministral 3B at $0.04/1M tokens is effectively free for most use cases.
Batch API: Available on all models for async processing at reduced rates.
Open source: Many Mistral models are open-weight, meaning you can self-host to avoid per-token costs entirely.

Open Source Models (Llama & Others)

Open-source models don't have a single "official" price — it depends on your hosting provider. Here's what major inference providers charge for Meta's Llama 4 and other popular open models:

Model	Provider	Input / 1M	Output / 1M
Llama 4 Maverick (400B MoE)	Together	$0.20	$0.60
Llama 4 Maverick (400B MoE)	Fireworks	$0.22	$0.88
Llama 4 Scout (109B MoE)	Together	$0.10	$0.30
Llama 4 Scout (109B MoE)	Fireworks	$0.15	$0.60
Llama 3.3 70B	Together	$0.10	$0.30
DeepSeek R1 (hosted)	Together	$0.17	$0.51
Qwen 3 235B MoE	Fireworks	$0.20	$0.60

Self-Hosting vs. Inference APIs

Inference APIs (Together, Fireworks, Groq, etc.): Pay per token. No infrastructure management. Best for variable or moderate workloads.
Self-hosting (vLLM, TGI, Ollama): Pay for compute (GPU rental). Cost-effective at high volumes — a single A100 running Llama 3.3 70B can serve thousands of requests/hour for ~$2/hr.
Break-even point: Self-hosting typically beats API pricing at roughly 50M+ tokens/day for a 70B model.

💡 Tip: Providers like Groq offer extremely fast inference (LPU hardware) with competitive pricing. If latency matters more than cost, check their current rates for Llama and Mistral models.

Cost Optimization Tips

The model you choose is only half the equation. How you use it matters just as much. These strategies can cut your LLM costs by 50–90%:

1. Prompt Caching

Every major provider now offers automatic prompt caching. If your requests share a common system prompt or context prefix, cached tokens are 90% cheaper. This is the single biggest cost lever for most applications.

Example: Chatbot with 4K system prompt

Without caching: 4,000 tokens × $3.00/1M = $0.012 per request

With caching: 4,000 tokens × $0.30/1M = $0.0012 per request

Savings: 90% on system prompt tokens — adds up fast at scale.

2. Model Routing

Don't use a $15/1M output model for tasks a $0.30/1M model can handle. Route requests to the cheapest capable model for each task:

Classification, extraction, simple Q&A → GPT-5.4 nano, Mistral Small, or Gemini Flash
Code generation, analysis, drafting → Claude Sonnet 4.6, GPT-5.4 mini, or Gemini 2.5 Pro
Complex reasoning, research, novel problems → Claude Opus 4.6, GPT-5.4, or Gemini 3 Pro

3. Batch Processing

If you don't need real-time responses, use batch APIs. OpenAI, Anthropic, and Google all offer 50% off for async batch jobs:

OpenAI Batch API — 50% discount, 24-hour window
Anthropic Message Batches — 50% discount, 24-hour window
Google Vertex Batch/Flex — 50% discount

4. Prompt Engineering

Be concise: Every token costs money. Remove filler from prompts.
Use structured output: JSON mode prevents rambling responses — fewer output tokens.
Set max_tokens: Cap output length to prevent runaway generation.
Compress context: Summarize conversation history instead of sending full transcripts.

5. Avoid Over-Thinking

Models with reasoning/thinking (Claude extended thinking, Gemini's thinking, GPT-5.4 reasoning) produce internal reasoning tokens billed at output rates. For simple tasks, disable thinking mode to avoid paying for unnecessary chain-of-thought tokens.

See exactly how much you'll spend

Plug in your actual usage numbers and compare costs across all providers side by side.

Open the Cost Calculator →

Which Model Should You Use?

There's no single "best" model. It depends on what you're building, how much you're willing to spend, and what trade-offs matter. Here's a decision framework:

Use Case	Budget Pick	Balanced Pick	Premium Pick
💬 Chatbot / Customer Support	Gemini 2.5 Flash $0.30/$2.50	Claude Haiku 4.5 $1.00/$5.00	Claude Sonnet 4.6 $3.00/$15.00
🧑‍💻 Code Generation	DeepSeek V3.2 $0.28/$0.42	Claude Sonnet 4.6 $3.00/$15.00	Claude Opus 4.6 $5.00/$25.00
📊 Data Extraction / Classification	Mistral Small 3.1 $0.10/$0.30	GPT-5.4 nano $0.20/$1.25	GPT-5.4 mini $0.75/$4.50
🔬 Research / Complex Reasoning	DeepSeek V3.2 Reasoner $0.28/$0.42	Gemini 2.5 Pro $1.25/$10.00	Claude Opus 4.6 $5.00/$25.00
📝 Long Document Processing	Gemini 2.5 Flash 1M context	Claude Sonnet 4.6 1M context	Gemini 3 Pro 1M+ context
🤖 Agentic / Multi-Step	Llama 4 Scout $0.10/$0.30	GPT-5.4 mini $0.75/$4.50	GPT-5.4 $2.50/$15.00
📱 Edge / On-Device	Ministral 3B $0.04/$0.04	Mistral Small 3.1 $0.10/$0.30	Llama 4 Scout self-host
💰 Maximum Savings (Batch)	DeepSeek V3.2 $0.28/$0.42	Gemini 2.5 Flash Batch $0.15/$1.25	Claude Sonnet Batch $1.50/$7.50

Decision Cheat Sheet

💸 "I want the cheapest possible" → DeepSeek V3.2 or Mistral Small 3.1

⚡ "I need the fastest" → Gemini 2.5 Flash or Groq-hosted Llama

🧠 "I need the smartest" → Claude Opus 4.6 or GPT-5.4

📏 "I have huge documents" → Gemini (1M native) or Claude Sonnet/Opus (1M)

🔒 "I need data privacy" → Self-host Llama 4 or Mistral Small (open weights)

⚖️ "Best bang for buck" → Claude Sonnet 4.6 or Gemini 2.5 Pro — strong quality at mid-range prices

Interactive AI Cost Calculator

Comparing prices in a table only gets you so far. Real costs depend on your specific usage patterns: how many requests per day, average prompt size, output length, caching hit rate, and batch vs. real-time split.

🦞 AI Model Cost Calculator

Enter your usage. Get instant cost estimates across every provider.

Free. No signup. No tracking.

Use the Calculator →

Methodology & Sources

All pricing data is sourced directly from official provider pricing pages and API documentation. Prices are for standard (pay-as-you-go) tier unless otherwise noted. Prices may vary by region, commitment level, or cloud marketplace.

OpenAI — openai.com/api/pricing
Anthropic — docs.anthropic.com
Google — cloud.google.com/vertex-ai
DeepSeek — api-docs.deepseek.com
Mistral — docs.mistral.ai

Last verified: March 21, 2026. Prices can change without notice. Always check the official pricing page before making decisions. Open-source model hosting prices are approximate and vary by provider, plan, and GPU availability.

AI Model Pricing Comparison 2026 GPT vs Claude vs Gemini vs DeepSeek

TL;DR — Quick Comparison Table

OpenAI Pricing Breakdown

Key Details

Anthropic (Claude) Pricing Breakdown

Key Details

Legacy Models Still Available

Google (Gemini) Pricing Breakdown

Key Details

DeepSeek Pricing Breakdown

Key Details

Mistral Pricing Breakdown

Key Details

Open Source Models (Llama & Others)

Self-Hosting vs. Inference APIs

Cost Optimization Tips

1. Prompt Caching

2. Model Routing

3. Batch Processing

4. Prompt Engineering

5. Avoid Over-Thinking

Which Model Should You Use?

Decision Cheat Sheet

Interactive AI Cost Calculator

Methodology & Sources