How Much Does the Claude API Cost in 2026? A Real Breakdown With Examples
Anthropic's Claude API pricing looks simple on the official pricing page, but the real cost of running Claude in production is rarely what the calculator predicts. This guide breaks down every Claude model's price, shows what actual agents cost per request, and explains the caching tricks that can slash your bill by 80% or more.
Claude API Pricing in 2026: The Full Table
Anthropic charges per million tokens, with separate rates for input and output. As of April 2026, here are the published rates for the main Claude 4.x family:
| Model | Input ($/MTok) | Output ($/MTok) | Cache Write | Cache Read |
|---|---|---|---|---|
| Claude Opus 4.7 | $15.00 | $75.00 | $18.75 | $1.50 |
| Claude Sonnet 4.6 | $3.00 | $15.00 | $3.75 | $0.30 |
| Claude Haiku 4.5 | $0.80 | $4.00 | $1.00 | $0.08 |
A few things to notice immediately:
- Output is 5x more expensive than input across every model. Long completions are where bills explode.
- Cache reads cost 10% of normal input. If your agent has a stable system prompt, this is your single biggest lever.
- Haiku 4.5 is roughly 19x cheaper than Opus 4.7 on input, and benchmarks show it handles 80%+ of routing, classification, and simple extraction tasks at near-Sonnet quality.
For deeper benchmarking notes, see our Claude model comparison or Anthropic's model card documentation.
What Does a Real Claude Agent Actually Cost Per Request?
Headline rates are abstract. Let's walk through three realistic scenarios with actual token counts measured from production agents instrumented with ClawPulse.
Scenario 1: A Simple Customer Support Bot (Sonnet 4.6)
A typical support turn looks like:
- System prompt: 1,200 tokens (instructions + tone guide)
- Conversation history: 2,500 tokens
- User message: 80 tokens
- Assistant response: 350 tokens
```
Input cost = (1200 + 2500 + 80) / 1_000_000 * $3.00 = $0.01134
Output cost = 350 / 1_000_000 * $15.00 = $0.00525
Total per turn ≈ $0.0166
```
A user with a 6-turn conversation costs roughly $0.10. At 10,000 conversations per day, that's $1,000/day or about $30,000/month — before any optimization.
Scenario 2: A RAG Agent With 8K-Token Context (Sonnet 4.6)
```python
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system=[
{
"type": "text",
"text": LONG_SYSTEM_PROMPT, # ~2,000 tokens
"cache_control": {"type": "ephemeral"}
}
],
messages=[
{"role": "user", "content": user_query_with_retrieved_docs} # ~8,000 tokens
]
)
# Without caching: (2000 + 8000) $3 / 1M + 800 $15 / 1M = $0.042
# With prompt caching on system: $0.030 first call, $0.024 thereafter
```
Per-request cost: ~$0.024 with caching, ~$0.042 without. That's a 43% reduction by adding three lines of code.
Scenario 3: A Multi-Step Tool-Calling Agent (Opus 4.7)
Coding agents and complex planners often run 5-15 tool iterations per user request, each round re-sending the growing message history.
A representative coding agent run we measured with ClawPulse:
- 8 tool iterations
- 47,000 cumulative input tokens
- 4,200 output tokens
```
Input = 47000 / 1M * $15 = $0.705
Output = 4200 / 1M * $75 = $0.315
Total per task ≈ $1.02
```
A single Opus-powered coding task costs over a dollar. Without prompt caching this is unsustainable — see Anthropic's agent cost optimization guide for why caching is non-optional for agentic workloads.
The Three Levers That Actually Cut Your Bill
1. Prompt Caching (Up to 90% Savings on Repeated Context)
If you re-send the same system prompt or document context across requests, prompt caching turns those tokens into 10%-priced reads. The break-even is one cache hit — after a single reuse, you've already saved money.
```javascript
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const response = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 1024,
system: [
{ type: "text", text: stableInstructions, cache_control: { type: "ephemeral" } },
{ type: "text", text: largeDocumentContext, cache_control: { type: "ephemeral" } }
],
messages: [{ role: "user", content: userQuery }]
});
```
The cache TTL is 5 minutes by default (1 hour with the extended cache beta). For chat applications with active sessions, hit rates of 85%+ are typical.
2. Model Routing (Send Cheap Tasks to Haiku)
Most "AI" requests don't need Opus or even Sonnet. A simple router that classifies intent first and dispatches accordingly will routinely cut costs 60-70%:
```python
def route(user_input: str) -> str:
classification = client.messages.create(
model="claude-haiku-4-5",
max_tokens=10,
messages=[{"role": "user", "content": f"Classify: {user_input}\nReply: SIMPLE or COMPLEX"}]
)
return "claude-sonnet-4-6" if "COMPLEX" in classification.content[0].text else "claude-haiku-4-5"
```
The classification call costs ~$0.0001. If it correctly downgrades 70% of traffic from Sonnet to Haiku, you save roughly 70% × ($0.0166 − $0.0044) ≈ $0.0085 per request, or $85 per 10,000 requests.
3. Output Token Discipline
Output costs 5x input. Three concrete tactics:
- Set `max_tokens` aggressively. Most chat replies don't need 4,096.
- Ask for structured JSON instead of prose explanations when the consumer is a program, not a human.
- For long-form generation, use `stop_sequences` to terminate the moment you have what you need.
A team we work with cut their monthly bill from $18,000 to $11,000 by adding `max_tokens: 600` to their default chat config. No other change.
Start monitoring your OpenClaw agents in 2 minutes
Free 14-day trial. No credit card. Just drop in one curl command.
Prefer a walkthrough? Book a 15-min demo.
How Claude Pricing Compares to GPT-4 and Gemini
Rough April 2026 input/output rates per million tokens:
| Model | Input | Output |
|---|---|---|
| Claude Opus 4.7 | $15 | $75 |
| Claude Sonnet 4.6 | $3 | $15 |
| Claude Haiku 4.5 | $0.80 | $4 |
| GPT-4o | $2.50 | $10 |
| GPT-4o mini | $0.15 | $0.60 |
| Gemini 2.5 Pro | $1.25 | $10 |
Sonnet 4.6 is more expensive than GPT-4o on paper, but in practice Sonnet's longer context window, better tool-calling reliability, and 90% cache discount often make it cheaper per successful task. Anthropic's own benchmark suite shows Sonnet 4.6 outperforming GPT-4o on SWE-bench Verified by ~12 points, which directly reduces retry costs.
Why Per-Request Pricing Isn't the Real Cost
The published per-token rate is what Anthropic charges. Your true cost includes:
- Retries when an agent loops or produces invalid tool calls
- Wasted tool iterations the agent took before reaching the answer
- Compaction overhead when conversations exceed the context window
- Unused output when you set `max_tokens` too high and the model rambles
This is exactly the visibility gap that observability platforms close. Langfuse, Helicone, and ClawPulse all surface per-request cost, but only ClawPulse correlates cost with agent outcome — i.e., you can see "this $1.20 task failed, this $0.40 task succeeded" and tune accordingly. See our pricing page for what monitoring at this granularity actually costs.
A Quick Cost Audit Checklist
Before you scale any Claude integration, run through this:
1. Are you using prompt caching on every stable system prompt? (docs)
2. Have you set realistic `max_tokens` on every endpoint?
3. Have you measured what % of requests actually need Opus or Sonnet vs. Haiku?
4. Do you log per-request input/output tokens to a dashboard?
5. Do you alert when a single user session exceeds a cost threshold?
Skipping any one of these is the difference between a $3,000/month bill and a $30,000/month bill.
FAQ
Is the Claude API cheaper than ChatGPT API?
It depends on the tier. Claude Haiku 4.5 ($0.80/$4) is more expensive than GPT-4o mini ($0.15/$0.60), but Claude Sonnet 4.6 is competitive with GPT-4o for most tasks and Anthropic's prompt caching discount is more aggressive than OpenAI's, which often flips the comparison once cache hits accumulate.
Does Anthropic charge for failed requests?
No. If a request returns a 4xx or 5xx error, you are not billed. However, you are billed for tool-calling iterations that the agent generates even if the final result is unusable, which is why agent observability matters.
What's the cheapest way to use Claude in production?
Three rules: (1) use Haiku 4.5 by default and only escalate to Sonnet/Opus when needed, (2) enable prompt caching on every system prompt over 1,000 tokens, (3) cap `max_tokens` at the realistic ceiling for each endpoint. Together these typically cut bills by 70-85%.
How do I track Claude API costs across multiple projects?
The Anthropic Console shows aggregate spend, but for per-feature, per-user, or per-agent breakdown you need an observability layer. See our agent monitoring demo for how ClawPulse attributes every token to the upstream feature that generated it.
---
Ready to see exactly where your Claude API budget is going? Book a 15-minute ClawPulse demo and we'll instrument one of your agents live so you can watch costs surface per request, per user, and per outcome.