LLM cost by model: Claude vs GPT vs Gemini vs Grok compared

Every model has a published price. But the number that shows up on your invoice depends on how many tokens you actually use — and that varies enormously by model, task, and how well your prompts are written.

This page gives you the full pricing table for every active model we support, plus the context you need to read it correctly: what "per million tokens" actually means at real usage volumes, which models suit which task types, and where most teams quietly overspend.

Pricing table — all active models

All prices are in USD per million tokens. Input tokens are the text (and images) you send; output tokens are what the model returns. Output is typically 3–5× more expensive than input. Table last updated May 15, 2026.

Model	Input / 1M tok	Output / 1M tok
Anthropic
`claude-opus-4-7`	$5.00	$25.00
`claude-sonnet-4-6`	$3.00	$15.00
`claude-haiku-4-5-20251001`	$1.00	$5.00
`claude-opus-4-6`	$5.00	$25.00
`claude-sonnet-4-5-20250929`	$3.00	$15.00
`claude-opus-4-5-20251101`	$5.00	$25.00
`claude-opus-4-1-20250805`	$15.00	$75.00
OpenAI
`gpt-5.5`	$5.00	$30.00
`gpt-5.5-pro`	$30.00	$180.00
`gpt-5.4`	$2.50	$15.00
`gpt-5.4-mini`	$0.75	$4.50
`gpt-5.4-nano`	$0.20	$1.25
`gpt-5.4-pro`	$30.00	$180.00
`gpt-5.2`	$1.75	$14.00
`gpt-5.2-pro`	$21.00	$168.00
`gpt-5.1`	$1.25	$10.00
`gpt-5`	$1.25	$10.00
`gpt-5-mini`	$0.25	$2.00
`gpt-5-nano`	$0.05	$0.40
`gpt-5-pro`	$15.00	$120.00
`gpt-4.1`	$2.00	$8.00
`gpt-4.1-mini`	$0.40	$1.60
`gpt-4.1-nano`	$0.10	$0.40
`o3-pro`	$20.00	$80.00
`o3`	$2.00	$8.00
`o4-mini`	$1.10	$4.40
`o3-mini`	$1.10	$4.40
Google
`gemini-3.1-pro-preview`	$2.00	$12.00
`gemini-3.1-flash-lite-preview`	$0.25	$1.50
`gemini-3-flash-preview`	$0.50	$3.00
`gemini-2.5-pro`	$1.25	$10.00
`gemini-2.5-flash`	$0.30	$2.50
`gemini-2.5-flash-lite`	$0.10	$0.40
xAI
`grok-4.3`	$1.25	$2.50
`grok-4.20-0309-reasoning`	$1.25	$2.50
`grok-4.20-0309-non-reasoning`	$1.25	$2.50
`grok-4.20-multi-agent-0309`	$1.25	$2.50
`grok-4-1-fast-reasoning`	$0.20	$0.50
`grok-4-1-fast-non-reasoning`	$0.20	$0.50
DeepSeek
`deepseek-v4-flash`	$0.14	$0.28
`deepseek-v4-pro`	$1.74	$3.48
Meta (Llama)
`meta-llama/llama-4-scout`	$0.10	$0.40
`meta-llama/llama-4-maverick`	$0.25	$1.00
`meta-llama/llama-3.3-70b-instruct`	$0.71	$0.71
`meta-llama/llama-3.1-8b-instruct`	$0.02	$0.05
Perplexity
`sonar`	$1.00	$1.00
`sonar-pro`	$3.00	$15.00
`sonar-reasoning-pro`	$2.00	$8.00
`sonar-deep-research`	$2.00	$8.00

Prices sourced from provider documentation and kept in sync with our pricing.ts source file. Use our calculator to model costs at your actual call volume. For Meta/Llama models, prices reflect Azure Global tier or the midpoint of major inference providers — actual cost depends on your provider. Perplexity Sonar models carry an additional per-request fee not reflected here.

What "per million tokens" means at real volumes

A million tokens sounds like a lot. It isn't. One million tokens is roughly 750,000 words — about 10 average-length novels. But most production LLM features don't work at that scale per day; the math still matters because individual call costs multiply across thousands of users.

Here's a more useful frame: at 1,000 calls per day with an average of 2,000 tokens per call (1,500 input + 500 output), using claude-sonnet-4-6:

Input cost1,500 tok × 1,000 calls × $3.00 / 1M = $4.50/day

Output cost500 tok × 1,000 calls × $15.00 / 1M = $7.50/day

Total$12.00/day → ~$365/month

Switch to claude-haiku-4-5 for the same workload and that becomes ~$1.50/day — an 8× difference. Whether that trade-off is worth it depends entirely on the task.

Which model for which task

Published benchmarks tell you about capability floors. They don't tell you what matters: cost-per-acceptable-output on your specific workload. That said, some patterns hold broadly:

Structured extraction

Parsing documents, extracting fields, classifying text. Output is short and deterministic. Use the cheapest model that hits your accuracy threshold — usually a Haiku, Flash, or Nano tier.

Long-form generation

Reports, drafts, summaries of long documents. Output is long; input may be long too. Model quality matters more here — but so does prompt efficiency, since a bloated system prompt at this scale adds up fast.

Multi-step reasoning

Coding agents, complex analysis, tasks requiring tool use. This is where flagship models justify their cost. Substituting a cheaper model here often means more retries, which erases the savings.

User-facing chat

Response quality is visible and directly affects retention. Don't cut corners here until you have data. Track cost-per-session, not cost-per-call — sessions are what your users experience.

Where most teams overspend

In practice, the model choice is rarely the biggest cost lever. The three largest sources of waste we see:

Oversized system prompts sent on every call. A 2,000-token system prompt running 50,000 times a month adds 100M input tokens — $300/month on Sonnet, before you've done anything useful. Audit your static prompt content; cache what you can.
Failed calls that still bill. Calls that return an error, an empty output, or a malformed response consume tokens and cost money. They're also invisible in your provider dashboard. On typical production workloads we see 3–8% wasted spend from failed calls.
Using the wrong model tier for a feature. Teams often default to their "main" model for new features without benchmarking cheaper alternatives. A feature that started on gpt-4o in week one may have run fine on gpt-4.1-nano all along.

↗

The only way to find these is per-feature cost tracking. Your provider bill doesn't break this down — our cost calculator can help you model it, and the dashboard surfaces it automatically once you instrument your calls.

Caching and batch discounts

Most providers offer two mechanisms that can significantly reduce costs:

Prompt caching (Anthropic, OpenAI) lets you cache a portion of your input — typically the system prompt — so repeated calls reuse the cached prefix at a fraction of the full input price. Anthropic charges ~10% of normal input price on cache reads. If your system prompts are long and stable, this is the highest-ROI optimization available.

Batch processing (Anthropic Batch API, OpenAI Batch) offers 50% discounts in exchange for async processing with up to 24-hour turnaround. Useful for offline workloads — document processing, bulk analysis, nightly jobs — where latency doesn't matter.

A note on keeping this current

Model pricing changes frequently — providers discount new models to drive adoption, then adjust as demand grows. This table is driven directly from our pricing.ts source file, which we update within 24 hours of any provider price change. The last updated date at the top of the table always reflects the current version. If you need to model costs at your specific usage volume, the LLM Cost Tracker calculator uses the same source.

Pricing table — all active models

What "per million tokens" means at real volumes

Which model for which task

Where most teams overspend

Caching and batch discounts

A note on keeping this current

Model costs at your actual volumes