AI API Pricing Comparison: Cost Per Million Tokens
Data Notice: Figures, rates, and statistics cited in this article are based on the most recent available data at time of writing and may reflect projections or prior-year figures. Always verify current numbers with official sources before making financial, medical, or educational decisions.
AI API Pricing Comparison: Cost Per Million Tokens
AI API pricing changes frequently as providers compete on cost and capability. This page compares current pricing across all major providers in a single, easy-to-reference table. Bookmark it and check back regularly for updates.
AI model comparisons are based on publicly available benchmarks and editorial testing. Results may vary by use case.
Complete Pricing Table (March 2026)
Premium Tier Models
| Model | Provider | Input (per 1M tokens) | Output (per 1M tokens) | Context Window | Notes |
|---|---|---|---|---|---|
| Claude Opus 4 | Anthropic | $15.00 | $75.00 | 200K | Strongest reasoning |
| o3 | OpenAI | $10.00 | $40.00 | 200K | + thinking token costs |
| Gemini Ultra | $7.00 | $21.00 | 1M+ | Largest context window |
Mid-Tier Models (Best Value)
| Model | Provider | Input (per 1M tokens) | Output (per 1M tokens) | Context Window | Notes |
|---|---|---|---|---|---|
| Claude Sonnet 4 | Anthropic | $3.00 | $15.00 | 200K | Best quality/cost ratio |
| GPT-4o | OpenAI | $2.50 | $10.00 | 128K | Strong generalist |
| Mistral Large | Mistral | $2.00 | $6.00 | 128K | Good multilingual |
| Gemini Pro | $1.25 | $5.00 | 1M+ | Great value with large context |
Budget Tier Models
| Model | Provider | Input (per 1M tokens) | Output (per 1M tokens) | Context Window | Notes |
|---|---|---|---|---|---|
| o3-mini | OpenAI | $1.10 | $4.40 | 200K | Budget reasoning |
| Claude Haiku 4 | Anthropic | $0.25 | $1.25 | 200K | Fast, very cheap |
| GPT-4o mini | OpenAI | $0.15 | $0.60 | 128K | Budget general purpose |
| Gemini Flash | $0.075 | $0.30 | 1M+ | Cheapest capable model |
All prices as of March 2026. Check provider websites for the latest pricing.
Cost Per Common Task
| Task | Approximate Tokens | Opus 4 | Sonnet 4 | GPT-4o | Haiku 4 | Flash |
|---|---|---|---|---|---|---|
| Single question + answer | 500 in / 300 out | $0.03 | $0.006 | $0.004 | $0.0005 | $0.0001 |
| Blog post generation | 500 in / 1,500 out | $0.12 | $0.024 | $0.016 | $0.002 | $0.0005 |
| Document summary (20 pages) | 15K in / 500 out | $0.26 | $0.053 | $0.043 | $0.004 | $0.001 |
| Code review (large file) | 5K in / 2K out | $0.23 | $0.045 | $0.033 | $0.004 | $0.001 |
| Full book analysis | 150K in / 2K out | $2.40 | $0.48 | N/A* | $0.04 | $0.01 |
GPT-4o’s 128K limit means it cannot process a full book in one pass.
Cost Reduction Features
Prompt Caching (Anthropic)
Anthropic offers prompt caching that reduces the cost of repeated context (system prompts, reference documents) by up to 90%. This is significant for applications that reuse the same context across many queries.
- Cache write: 1.25x base input cost
- Cache read: 0.1x base input cost (90% discount)
Batch Processing
Several providers offer discounts for non-time-sensitive batch processing:
| Provider | Batch Discount | Turnaround |
|---|---|---|
| Anthropic | 50% off | Within 24 hours |
| OpenAI | 50% off | Within 24 hours |
| Varies | Varies |
Volume Discounts
Enterprise agreements with committed usage can reduce pricing further. Contact providers directly for volume pricing.
Price Trends
AI API pricing has fallen dramatically:
| Year | Cost for GPT-4-class 1M Output Tokens | Reduction |
|---|---|---|
| 2023 | ~$60.00 | Baseline |
| 2024 | ~$15.00 | 75% reduction |
| 2025 | ~$10.00 | 83% reduction |
| 2026 | ~$5-10.00 | 85-92% reduction |
Expect continued price reductions as inference efficiency improves and competition intensifies.
Choosing the Right Tier
Use premium models when:
- Complex reasoning, analysis, or coding is required
- Accuracy on difficult problems justifies the cost
- The cost of errors exceeds the cost of using a better model
Use mid-tier models when:
- You need good quality for general tasks
- You want the best quality-to-cost ratio
- Volume is moderate (hundreds to thousands of queries/day)
Use budget models when:
- Tasks are simple (classification, extraction, routing)
- Volume is very high (millions of queries/day)
- Speed matters more than maximum quality
- You are building a first pass before human review
AI Costs Explained: API Pricing, Token Limits, and Hidden Fees
Key Takeaways
- AI API pricing spans a 100x range from Gemini Flash ($0.075/1M input) to Claude Opus 4 ($15/1M input).
- Mid-tier models (Claude Sonnet 4, GPT-4o) offer the best quality-to-cost ratio for most applications.
- Prompt caching and batch processing can reduce costs by 50-90% for eligible workloads.
- Prices have dropped 85-92% since 2023 and continue falling.
- Output tokens are always more expensive than input tokens (2-5x), so controlling output length saves money.
Next Steps
- Calculate your estimated monthly spend: AI Cost Calculator: Estimate Your Monthly API Spend.
- Understand tokens and pricing in depth: AI Costs Explained: API Pricing, Token Limits, and Hidden Fees.
- Count tokens in your content: Token Counter Tool: Paste Text, See Token Count.
- Compare subscription plans as an alternative to API pricing: ChatGPT Plus vs Claude Pro vs Gemini Advanced: Subscription Comparison.
This content is for informational purposes only and reflects independently researched comparisons. AI model capabilities change frequently — verify current specs with providers. Not professional advice.