Every model has a published price. But the number that shows up on your invoice depends on how many tokens you actually use — and that varies enormously by model, task, and how well your prompts are written.
This page gives you the full pricing table for every active model we support, plus the context you need to read it correctly: what "per million tokens" actually means at real usage volumes, which models suit which task types, and where most teams quietly overspend.
Pricing table — all active models
All prices are in USD per million tokens. Input tokens are the text (and images) you send; output tokens are what the model returns. Output is typically 3–5× more expensive than input. Table last updated May 15, 2026.
| Model | Input / 1M tok | Output / 1M tok |
|---|---|---|
| Anthropic | ||
claude-opus-4-7 | $5.00 | $25.00 |
claude-sonnet-4-6 | $3.00 | $15.00 |
claude-haiku-4-5-20251001 | $1.00 | $5.00 |
claude-opus-4-6 | $5.00 | $25.00 |
claude-sonnet-4-5-20250929 | $3.00 | $15.00 |
claude-opus-4-5-20251101 | $5.00 | $25.00 |
claude-opus-4-1-20250805 | $15.00 | $75.00 |
| OpenAI | ||
gpt-5.5 | $5.00 | $30.00 |
gpt-5.5-pro | $30.00 | $180.00 |
gpt-5.4 | $2.50 | $15.00 |
gpt-5.4-mini | $0.75 | $4.50 |
gpt-5.4-nano | $0.20 | $1.25 |
gpt-5.4-pro | $30.00 | $180.00 |
gpt-5.2 | $1.75 | $14.00 |
gpt-5.2-pro | $21.00 | $168.00 |
gpt-5.1 | $1.25 | $10.00 |
gpt-5 | $1.25 | $10.00 |
gpt-5-mini | $0.25 | $2.00 |
gpt-5-nano | $0.05 | $0.40 |
gpt-5-pro | $15.00 | $120.00 |
gpt-4.1 | $2.00 | $8.00 |
gpt-4.1-mini | $0.40 | $1.60 |
gpt-4.1-nano | $0.10 | $0.40 |
o3-pro | $20.00 | $80.00 |
o3 | $2.00 | $8.00 |
o4-mini | $1.10 | $4.40 |
o3-mini | $1.10 | $4.40 |
gemini-3.1-pro-preview | $2.00 | $12.00 |
gemini-3.1-flash-lite-preview | $0.25 | $1.50 |
gemini-3-flash-preview | $0.50 | $3.00 |
gemini-2.5-pro | $1.25 | $10.00 |
gemini-2.5-flash | $0.30 | $2.50 |
gemini-2.5-flash-lite | $0.10 | $0.40 |
| xAI | ||
grok-4.3 | $1.25 | $2.50 |
grok-4.20-0309-reasoning | $1.25 | $2.50 |
grok-4.20-0309-non-reasoning | $1.25 | $2.50 |
grok-4.20-multi-agent-0309 | $1.25 | $2.50 |
grok-4-1-fast-reasoning | $0.20 | $0.50 |
grok-4-1-fast-non-reasoning | $0.20 | $0.50 |
| DeepSeek | ||
deepseek-v4-flash | $0.14 | $0.28 |
deepseek-v4-pro | $1.74 | $3.48 |
| Meta (Llama) | ||
meta-llama/llama-4-scout | $0.10 | $0.40 |
meta-llama/llama-4-maverick | $0.25 | $1.00 |
meta-llama/llama-3.3-70b-instruct | $0.71 | $0.71 |
meta-llama/llama-3.1-8b-instruct | $0.02 | $0.05 |
| Perplexity | ||
sonar | $1.00 | $1.00 |
sonar-pro | $3.00 | $15.00 |
sonar-reasoning-pro | $2.00 | $8.00 |
sonar-deep-research | $2.00 | $8.00 |
Prices sourced from provider documentation and kept in sync with our pricing.ts source file. Use our calculator to model costs at your actual call volume. For Meta/Llama models, prices reflect Azure Global tier or the midpoint of major inference providers — actual cost depends on your provider. Perplexity Sonar models carry an additional per-request fee not reflected here.
What "per million tokens" means at real volumes
A million tokens sounds like a lot. It isn't. One million tokens is roughly 750,000 words — about 10 average-length novels. But most production LLM features don't work at that scale per day; the math still matters because individual call costs multiply across thousands of users.
Here's a more useful frame: at 1,000 calls per day with an average of 2,000 tokens per call (1,500 input + 500 output), using claude-sonnet-4-6:
Switch to claude-haiku-4-5 for the same workload and that becomes ~$1.50/day — an 8× difference. Whether that trade-off is worth it depends entirely on the task.
Which model for which task
Published benchmarks tell you about capability floors. They don't tell you what matters: cost-per-acceptable-output on your specific workload. That said, some patterns hold broadly:
Where most teams overspend
In practice, the model choice is rarely the biggest cost lever. The three largest sources of waste we see:
- Oversized system prompts sent on every call. A 2,000-token system prompt running 50,000 times a month adds 100M input tokens — $300/month on Sonnet, before you've done anything useful. Audit your static prompt content; cache what you can.
- Failed calls that still bill. Calls that return an error, an empty output, or a malformed response consume tokens and cost money. They're also invisible in your provider dashboard. On typical production workloads we see 3–8% wasted spend from failed calls.
- Using the wrong model tier for a feature. Teams often default to their "main" model for new features without benchmarking cheaper alternatives. A feature that started on
gpt-4oin week one may have run fine ongpt-4.1-nanoall along.
Caching and batch discounts
Most providers offer two mechanisms that can significantly reduce costs:
Prompt caching (Anthropic, OpenAI) lets you cache a portion of your input — typically the system prompt — so repeated calls reuse the cached prefix at a fraction of the full input price. Anthropic charges ~10% of normal input price on cache reads. If your system prompts are long and stable, this is the highest-ROI optimization available.
Batch processing (Anthropic Batch API, OpenAI Batch) offers 50% discounts in exchange for async processing with up to 24-hour turnaround. Useful for offline workloads — document processing, bulk analysis, nightly jobs — where latency doesn't matter.
A note on keeping this current
Model pricing changes frequently — providers discount new models to drive adoption, then adjust as demand grows. This table is driven directly from our pricing.ts source file, which we update within 24 hours of any provider price change. The last updated date at the top of the table always reflects the current version. If you need to model costs at your specific usage volume, the LLM Cost Tracker calculator uses the same source.