How much does the Claude API cost?

Claude API pricing is per token, billed in USD per million tokens (MTok). As of the latest update, Claude Haiku 4.5 is £0.79 ($1) input and £3.95 ($5) output per million tokens, Claude Sonnet 4.6 is £2.37 ($3) in and £11.85 ($15) out, Claude Opus 4.8 is £3.95 ($5) in and £19.75 ($25) out, and Claude Fable 5 is £7.90 ($10) in and £39.50 ($50) out. Output tokens cost about five times input tokens.

Is the Claude API cheaper than a Claude subscription?

It depends on volume. A Claude Pro subscription at around £14.20 ($18) a month (annual) is better value for steady individual chat use. The API is cheaper and more flexible for building applications, automations and high-volume or bursty workloads, because you pay only for the tokens you use, with no monthly minimum.

How can I reduce my Claude API bill?

Four levers: pick the smallest capable model (Haiku for simple tasks, Sonnet for most production work, Opus for the hardest reasoning); use prompt caching to pay 0.1x input price on repeated context; use the Batch API for a 50% discount on non-urgent work; and monitor token usage to find waste. Caching and batching discounts stack.

What is prompt caching and how much does it save?

Prompt caching reuses previously processed parts of your prompt. A cache read costs 0.1x the standard input price, so it pays off after one read for the 5-minute cache (a 1.25x write) or two reads for the 1-hour cache (a 2x write). For agents and chatbots that resend a large system prompt or document each turn, it can cut input costs dramatically.

Does Claude charge extra for tools and the 1M context window?

The full 1M-token context window is billed at standard per-token rates, with no premium. Most tools are billed as normal tokens; web fetch and code execution (when used with web search or web fetch) add no extra charge. Web search costs £7.90 ($10) per 1,000 searches plus token costs. Managed Agents add £0.06 ($0.08) per session-hour of runtime.

undefined (June 2026)

Quick Answer:

The Claude API charges per token, billed in US dollars per million tokens (MTok). As of June 2026, the headline rates are Haiku 4.5 at £0.79 ($1) in / £3.95 ($5) out, Sonnet 4.6 at £2.37 ($3) in / £11.85 ($15) out, Opus 4.8 at £3.95 ($5) in / £19.75 ($25) out, and the flagship Fable 5 at £7.90 ($10) in / £39.50 ($50) out. Output costs roughly 5x input. Prompt caching (0.1x reads) and the Batch API (50% off) can slash that, and the full 1M-token context window carries no premium.

Claude is priced by the token, not the seat, and that is the single most important thing to understand before you build on it. Get the model choice and the caching right, and the same workload can cost a tenth of what it would on default settings.

This is the complete, current breakdown of Claude API pricing, every model, every modifier, and the levers that actually move your bill. All figures are taken from Anthropic's official pricing page and verified for June 2026. Prices are billed in USD; the £ figures are approximate at around £0.79 to the dollar.

How Claude API Pricing Works

Unlike a Claude Pro or Max subscription, which is a flat monthly fee for the chat apps, the Claude API is pure usage-based billing. You pay for what the model reads and writes, measured in tokens. A token is roughly four characters or about 0.75 of an English word, so 1,000 tokens is around 750 words.

Every price is quoted per million tokens, abbreviated MTok. There are two core numbers for each model: the input price (everything you send, including the system prompt, conversation history, documents and tool definitions) and the output price (everything the model generates). Output is consistently about five times more expensive than input across the range, because generation is the compute-heavy half.

On top of those two numbers sit a handful of modifiers: prompt caching (cheaper repeated context), the Batch API (half price for asynchronous work), Fast mode (a premium for speed), data residency (a 1.1x premium for US-only routing) and server-side tools (extra charges for things like web search). We will cover each below. One quirk worth flagging early: Opus 4.7 and later use a new tokenizer that can use up to 35% more tokens for the same text, so compare total cost, not just the per-token rate.

The Full Model Pricing Table

Here is the full Claude model line-up with standard API pricing, current for June 2026. All values are USD per million tokens. The cache columns are explained in the prompt caching section.

Model	Input	5m cache write	1h cache write	Cache read	Output
Claude Fable 5	$10	$12.50	$20	$1	$50
Claude Mythos 5 (limited)	$10	$12.50	$20	$1	$50
Claude Opus 4.8	$5	$6.25	$10	$0.50	$25
Claude Opus 4.7	$5	$6.25	$10	$0.50	$25
Claude Opus 4.6	$5	$6.25	$10	$0.50	$25
Claude Opus 4.5	$5	$6.25	$10	$0.50	$25
Claude Opus 4.1 (deprecated)	$15	$18.75	$30	$1.50	$75
Claude Sonnet 4.6	$3	$3.75	$6	$0.30	$15
Claude Sonnet 4.5	$3	$3.75	$6	$0.30	$15
Claude Haiku 4.5	$1	$1.25	$2	$0.10	$5
Claude Haiku 3.5 (retired)	$0.80	$1	$1.60	$0.08	$4

Standard Claude API pricing, USD per million tokens. Source: Anthropic official pricing page, June 2026.

The shape of the range is clear. Haiku 4.5 is the budget workhorse at £0.79 ($1) input and £3.95 ($5) output. Sonnet 4.6, the default for most production work, sits at £2.37 ($3) in and £11.85 ($15) out. Opus 4.8, the flagship for hard reasoning, is £3.95 ($5) in and £19.75 ($25) out, notably half the price of the older Opus 4.1. The Mythos-class pair, Fable 5 and Mythos 5, sit at the top at £7.90 ($10) in and £39.50 ($50) out, though Fable 5 access has its own complications worth reading about in our Fable 5 suspension coverage.

Retired and deprecated models like Opus 4, Sonnet 4 and Haiku 3.5 remain on the sheet for reference and for the cloud platforms that still serve them, but new builds should use the current generation.

Batch API Pricing (50% Off)

If your work is not time-sensitive, the Batch API is the single biggest easy win: it processes large volumes asynchronously at a flat 50% discount on both input and output. Submit a job, collect the results within the processing window, and pay half. The headline batch rates:

Model	Batch input	Batch output
Claude Fable 5	$5	$25
Claude Opus 4.8	$2.50	$12.50
Claude Sonnet 4.6	$1.50	$7.50
Claude Haiku 4.5	$0.50	$2.50

Batch API pricing, USD per million tokens (50% off standard). Source: Anthropic.

Batch is ideal for classification, summarisation, extraction, evaluation runs, content generation and any backlog you can process overnight. The discount stacks with prompt caching, so a cached, batched job is dramatically cheaper than naive real-time calls.

Prompt Caching Explained

Prompt caching is the lever most teams underuse. Instead of reprocessing the same large system prompt, document or conversation history on every request, the API stores it and reads it back at a fraction of the input price. There are three cache prices, expressed as multipliers of the base input rate:

5-minute cache write: 1.25x base input price, valid for five minutes.
1-hour cache write: 2x base input price, valid for one hour.
Cache read (hit): 0.1x base input price, the same duration as the write.

The maths is friendly. Because a read is just 10% of the input price, the 5-minute cache pays for itself after a single read (the 1.25x write is recouped almost immediately), and the 1-hour cache pays off after two reads. For a chatbot or agent that resends a 50,000-token system prompt and document on every turn, caching can turn the dominant cost line into a rounding error. Caching multipliers stack with the Batch discount and data residency.

The 1M Context Window

This is a pleasant surprise: Fable 5, Mythos 5, Opus 4.8, Opus 4.7, Opus 4.6 and Sonnet 4.6 all include the full 1 million-token context window at standard pricing, with no long-context premium. A 900,000-token request is billed at the same per-token rate as a 9,000-token one. Prompt caching and batch discounts apply at standard rates across the whole window. That makes large-document and whole-codebase workflows far more predictable than on platforms that charge a surcharge above a threshold.

Fast Mode Pricing

Fast mode, in research preview, gives significantly faster output for the Opus line at a premium. It applies across the full context window, including requests over 200k input tokens, and is not available on the Batch API or Claude Platform on AWS.

Model	Fast input	Fast output
Claude Opus 4.8	$10	$50
Claude Opus 4.6 / 4.7	$30	$150

Fast mode pricing, USD per million tokens. Caching and data-residency multipliers stack on top. Source: Anthropic.

Note how much cheaper fast mode is on Opus 4.8 (£7.90/$10 in, £39.50/$50 out) than on the older 4.6 and 4.7 (£23.70/$30 in, £118.50/$150 out). For latency-sensitive premium work, 4.8 is now both the smartest and the most economical fast option.

Tool Use & Server Tools

Tool use is priced as normal tokens, plus a small system-prompt overhead and, for some server-side tools, a usage charge. Defining tools adds tokens (names, descriptions and schemas), and Claude adds a tool-use system prompt of roughly 290 to 805 tokens depending on the model and tool choice. The specific add-ons:

Web search: £7.90 ($10) per 1,000 searches, plus standard token costs for the results. Each search counts once regardless of results returned; failed searches are not billed.
Web fetch: no additional charge, you pay only for the fetched content as input tokens. Use max_content_tokens to cap large pages.
Code execution: free when used with web search or web fetch. Otherwise billed by container time after 1,550 free hours per month, then £0.04 ($0.05) per hour per container, with a five-minute minimum.
Bash tool: adds 245 input tokens per call, plus the size of any command output.
Text editor tool: adds about 700 input tokens (Claude 4.x).
Computer use: adds 466 to 499 system-prompt tokens plus 735 tokens per tool definition, plus screenshot vision tokens.

The practical takeaway: client-side tools are essentially free beyond the tokens they add, while web search is the one server tool with a real per-call price. Budget for it if your agent searches heavily.

Claude Managed Agents

Claude Managed Agents are billed on two dimensions: tokens at the standard model rates above, plus session runtime at £0.06 ($0.08) per session-hour. Runtime is metered to the millisecond and only accrues while the session status is running; idle, rescheduling and terminated time is free. Prompt caching applies; the Batch discount, Fast mode and data-residency multipliers do not, because sessions are stateful and interactive. Session runtime replaces separate container-hour billing.

A worked example from Anthropic: a one-hour Opus 4.8 coding session using 50,000 input and 15,000 output tokens costs £0.20 ($0.25) for input, £0.30 ($0.375) for output and £0.06 ($0.08) for runtime, about £0.56 ($0.705) total. With 40,000 of those input tokens served from cache, the total drops to roughly £0.42 ($0.525).

Bedrock, Vertex & AWS Pricing

Claude is available beyond the first-party API. On Amazon Bedrock and Google Vertex AI the cloud provider invoices you, and from Sonnet 4.5, Haiku 4.5 and Opus 4.5 onward, regional and multi-region endpoints carry a 10% premium over global endpoints. The first-party Claude API is global by default.

Claude Platform on AWS bills through AWS Marketplace in Claude Consumption Units (CCUs), where 100 CCU equals £0.79 ($1) of usage at standard rates after any negotiated discount. It is postpaid, metered hourly, with discounts applied as fewer CCUs. Separately, on Opus 4.6, Sonnet 4.6 and later, choosing US-only routing via the inference_geo parameter applies a 1.1x multiplier across all token categories; global routing (the default) uses standard pricing.

API vs Subscription: Which Is Cheaper?

This is the question most people actually want answered. The Claude apps (Pro at about £14.20/$18 a month on annual billing, Max from around £79/$100) are a flat fee for interactive chat, projects and the desktop tools. The API is metered. The rule of thumb:

Choose a subscription if you are one person chatting with Claude all day. The flat fee caps your cost and you get the polished apps, Cowork and Claude Code allowances.
Choose the API if you are building a product, automating a workflow, serving many users, or running bursty/batch jobs. You pay only for tokens used, scale to zero when idle, and unlock caching and batching.

For a fuller breakdown of the consumer plans, see our Claude pricing guide, and for how Claude stacks up on price against rivals, our Claude vs ChatGPT, Gemini and Grok comparison.

Worked Cost Examples

Numbers make this concrete. A few realistic scenarios at current rates:

10,000 support tickets on Haiku 4.5 at roughly 3,700 tokens each works out to about £29 ($37) in total, the kind of unit economics that makes AI support viable.
A Sonnet 4.6 chatbot sending a 20,000-token system prompt and getting a 500-token reply costs about £0.06 ($0.075) per turn uncached, but with caching the repeated context drops to 0.1x and per-turn cost falls by more than half.
An Opus 4.8 research run reading a 200,000-token corpus and writing a 5,000-token report costs about £0.89 ($1.13) for that single deep call, no long-context surcharge.

The pattern is always the same: model choice sets the order of magnitude, caching and batching cut the recurring cost, and output length is the multiplier to watch.

Pricing by Use Case: Five Real Scenarios

Abstract per-token rates only become intuitive when you map them to real products. Here are five common build patterns, the sensible model choice, and where the cost actually lands.

1. Customer-support assistant. Most tickets are routine, so the right tool is Haiku 4.5 with a cached knowledge base. At roughly 3,700 tokens a conversation, ten thousand tickets cost about £29 ($37) in total, and caching the support knowledge base drives the recurring input cost towards zero. Escalate only the genuinely tricky tickets to Sonnet. This is the canonical example of AI economics working in your favour: a task that would cost a human team hours resolves for pennies each.

2. Retrieval-augmented generation (RAG) over your docs. Here the cost driver is the retrieved context you stuff into each prompt. Sonnet 4.6 is the natural default. The single most important move is caching: if your system prompt and a core set of documents are stable, cache them so each query pays 0.1x on that bulk and full input price only on the user's actual question. Without caching, a RAG app that injects 30,000 tokens of context per query gets expensive fast; with it, the same app is comfortably affordable.

3. Coding agent. Coding is reasoning-heavy and benefits from the strongest models, so this is where Opus 4.8 earns its £19.75 ($25) output rate, or a Sonnet-to-Opus cascade for cost control. Expect output tokens to dominate, because the agent writes code, diffs and explanations. Cap output where you can, cache the repository context that does not change between steps, and consider Managed Agents if you want the £0.06 ($0.08) per session-hour runtime model rather than wiring the loop yourself.

4. Bulk document processing. Summarising, classifying or extracting from a large corpus is the textbook Batch API use case. Run it on Haiku or Sonnet, submit as a batch for the flat 50% discount, and cache any shared instructions. A job that would cost £200 ($253) in real-time calls drops to £100 ($127) on batch alone, before caching. Because it is asynchronous, the latency cost is zero to you.

5. High-volume content generation. Generating product descriptions, metadata or variations at scale lives and dies on output cost, since output is 5x input. Use the smallest model that holds quality (often Haiku, sometimes Sonnet), keep prompts tight, set firm max_tokens limits, and batch the run. The discipline here is ruthless output control: every unnecessary sentence is a line item multiplied by your volume.

Across all five, the same three levers recur: choose the smallest capable model, cache the stable context, and batch anything that can wait. Master those and the rate card stops being intimidating and becomes a set of dials you control.

How Claude API Pricing Compares to GPT and Gemini

Claude is rarely the cheapest option on a pure headline per-token basis, and it is not trying to be. Anthropic positions on capability per pound rather than raw price, and the tiered line-up is designed so you almost never have to overpay: you drop down to Haiku or Sonnet for the work that does not need a flagship. That tiering is the real story when you compare vendors, because a like-for-like comparison depends entirely on which model you would actually run for a given task.

Against OpenAI, the rough shape is that GPT's mid-tier and Claude's Sonnet trade blows on price, while at the top end Opus 4.8 at £3.95 ($5) input and £19.75 ($25) output is competitive with, and often cheaper than, rival flagships, especially since Anthropic halved Opus pricing from the old 4.1 generation. Against Google, Gemini's Flash tier tends to undercut everyone on the absolute floor, which is why high-volume, low-complexity jobs sometimes land there. Rival rates move frequently, so rather than quote numbers that will be stale within weeks, we keep a living side-by-side in our Claude vs ChatGPT, Gemini and Grok comparison.

Two structural advantages tip many real-world bills in Claude's favour regardless of the sticker price. First, the full 1M-token context window carries no long-context premium, where some rivals surcharge above a threshold. Second, prompt caching at 0.1x reads is aggressive and easy to apply, so agentic and chat workloads that resend large context get cheaper on Claude than a naive per-token comparison suggests. The lesson: compare the total cost of the model you would deploy, with caching and batching applied, not the top-line input rate.

Vision & Image Input Costs

Images are not free, and they are not billed as a flat per-image fee either. When you send an image to Claude, it is converted into input tokens based on its dimensions, and those tokens are billed at the model's standard input rate. A larger image uses more tokens, so resolution directly drives cost. This matters most for two workloads: document and screenshot analysis, and computer use, where every screenshot Claude takes is re-fed as image tokens.

The practical implications are easy to miss in a budget. A computer-use agent that captures a screenshot on every step can accumulate image tokens quickly, on top of the 466 to 499 system-prompt tokens and 735 tokens per tool definition that computer use already adds. Vision-heavy pipelines should downscale images to the smallest resolution that preserves the detail the task needs, crop to the region of interest, and avoid resending unchanged images turn after turn. Where the same reference image recurs, prompt caching applies to it just like text, so a cached image is read back at 0.1x.

For high-volume vision work, the model choice lever is sharper still. Running optical-character-recognition-style extraction or simple visual classification on Haiku 4.5 at £0.79 ($1) per million input tokens is a fraction of doing the same on Opus, and for most structured-extraction tasks the quality gap is negligible. Reserve Opus vision for genuinely hard visual reasoning, such as interpreting a complex chart or a dense scientific figure.

The Tokenizer Shift & Migration Costs

Here is the subtle trap that catches teams upgrading models: Opus 4.7 and later use a new tokenizer, and Anthropic states it may use up to 35% more tokens for the same fixed text. The per-token price has not changed, but if the same prompt now tokenises into more tokens, the same task can cost more even at an identical rate card. This is the clearest example of why you should always compare total cost per task, not the headline per-million-token figure.

Concretely, migrating a workload from an older Opus to Opus 4.8 can raise token counts on identical inputs, partially offsetting the headline saving from the cheaper rate. In most cases the newer model's stronger performance, shorter reasoning and better instruction-following more than compensate, because it gets to the answer in fewer output tokens, and output is the expensive half. But you should benchmark on your own prompts rather than assume. Run a representative sample through both models, measure total input plus output tokens and the resulting cost, and decide on the evidence.

The same caution applies to cross-vendor migration. Token counts are not comparable between providers because each uses a different tokenizer, so a prompt that is 1,000 tokens on one platform may be 1,100 on another. Always re-measure when you move a workload, and rebuild your cost model on the new platform's actual token counts.

How Claude's Prices Have Changed Over Time

The direction of travel has been strongly downward at the top of the range, which is unusual and worth understanding. The Opus 4.1 generation cost £11.85 ($15) input and £59.25 ($75) output per million tokens. From Opus 4.5 onward, Anthropic cut that to £3.95 ($5) input and £19.75 ($25) output, a two-thirds reduction for the flagship tier, and held it through 4.6, 4.7 and 4.8. That is a remarkable deflation for frontier capability in barely a year, driven by efficiency gains and competitive pressure.

Lower down the range the picture is more stable. Sonnet has sat at £2.37 ($3) input and £11.85 ($15) output across 4, 4.5 and 4.6. Haiku actually edged up slightly, from £0.63 ($0.80) input on the retired 3.5 to £0.79 ($1) on 4.5, reflecting the much greater capability of the new small model. The Mythos-class tier (Fable 5 and Mythos 5) launched at £7.90 ($10) input and £39.50 ($50) output, less than half the price of the earlier Mythos Preview, signalling Anthropic's intent to make its most powerful tier broadly affordable rather than a luxury good.

The takeaway for planning: do not over-commit to a fixed multi-year cost model based on today's flagship price. If the recent pattern holds, the capability you pay a premium for now will likely be cheaper, or available in a smaller, cheaper model, within a couple of release cycles. Architect so you can swap models as the price-performance frontier moves.

Hidden Costs & Gotchas to Budget For

The rate card is honest, but several real costs hide between the lines of a quick estimate. Budget for these before you are surprised by an invoice:

Output dominates. At 5x the input price, a chatty model is an expensive one. Verbose system prompts that encourage long answers, or missing max_tokens caps, quietly multiply your bill.
Conversation history compounds. In a multi-turn chat, every prior turn is resent as input on the next request. A long conversation can cost far more on its tenth turn than its first, unless you cache or summarise the history.
The tool-use system prompt. Simply enabling tools adds roughly 290 to 805 tokens of system prompt per request depending on model and tool choice, before any tool actually runs.
Server tool charges. Web search at £7.90 ($10) per 1,000 searches is the one to watch in a search-heavy agent; the searches and their returned content both add up.
Regional and US-only premiums. Bedrock and Vertex regional endpoints add 10% over global, and the first-party inference_geo: "us" option adds a 1.1x multiplier across every token category. Data residency has a real price.
Retries and failures. Failed generations that you retry are billed for the tokens they consumed. Robust back-off and idempotency keep retry costs down. (Web searches that error are not billed.)
Container time for code execution. Beyond the 1,550 free hours a month, code execution is £0.04 ($0.05) per container-hour with a five-minute minimum, and files preloaded onto a container are billed even if the tool is never invoked.

None of these are hidden in the unfair sense, they are all documented, but they are easy to omit from a back-of-envelope figure, and together they can move a real bill by a meaningful margin.

Estimating & Forecasting Your Spend

To forecast Claude API costs before you build, work from three numbers per request: expected input tokens, expected output tokens, and request volume. Estimate tokens with the rule that one token is about four characters, or roughly 0.75 of an English word, so 1,000 words is around 1,300 tokens. For documents, a rough guide from Anthropic's own figures: a typical 10kB web page is about 2,500 tokens, a 100kB documentation page about 25,000 tokens, and a 500kB research-paper PDF about 125,000 tokens.

Multiply through and you have a per-request cost; multiply by volume for a monthly figure. Then apply the discounts you will realistically use: if 70% of your input is stable, cached context, recalculate that portion at 0.1x. If a chunk of work is asynchronous, halve it with batch. The gap between the naive estimate and the optimised one is usually large, which is exactly why the optimisation steps below are worth the engineering time.

Finally, instrument from day one. Every Claude API response returns a usage object with input, output, cache-read and cache-creation token counts, plus server-tool usage. Log these per feature and per customer so you can see where the money actually goes, attribute cost to value, and catch regressions, such as a prompt change that doubled output length, before they show up on the invoice. Forecasting gets you in the right order of magnitude; measurement keeps you there.

How to Cut Your Claude API Bill

Four moves, in order of impact for most teams:

Right-size the model. Use Haiku for classification, extraction and simple chat; Sonnet for the bulk of production work; Opus only for the genuinely hard reasoning. Downgrading a task from Opus to Sonnet is a 5x saving; to Haiku, far more.
Cache aggressively. Any stable system prompt, document or tool schema that you resend should be cached. Reads at 0.1x are the cheapest tokens on the platform.
Batch what can wait. Move evals, backfills and non-urgent generation to the Batch API for an instant 50% cut.
Trim output. Output is 5x input. Ask for concise responses, use stop sequences, and cap max_tokens. Shorter answers are cheaper answers.

Stack these and a workload that looked expensive on a napkin becomes comfortably affordable in production.

Rate Limits & Tiers

Pricing is per token, but throughput is governed by usage tiers. New accounts start at Tier 1 with basic limits and climb through Tiers 2 to 4 as spend and history grow, with Enterprise offering custom limits. New users receive a small amount of free credits to test the API. Volume discounts, academic and research pricing, and bespoke enterprise terms are negotiated case by case through Anthropic sales. Billing is on actual monthly usage, in USD, with card or invoice options and live tracking in the Claude Console.

Choosing the Right Model for the Job

Because the price difference between tiers is so large, model selection is the highest-leverage cost decision you will make, and the good news is that the right choice is usually obvious once you frame the task honestly. The question is not "which is the best model?" but "what is the cheapest model that clears the bar for this specific job?"

Reach for Haiku 4.5 (£0.79/$1 in, £3.95/$5 out) for high-volume, well-defined work: classification, tagging, routing, data extraction, simple summarisation, first-line support replies, and moving structured data around. At this price you can run it at scale, and for these tasks the accuracy is typically indistinguishable from a flagship.

Default to Sonnet 4.6 (£2.37/$3 in, £11.85/$15 out) for the bulk of production application logic: drafting and editing, retrieval-augmented generation over your documents, coding assistance, multi-step workflows and most agentic tasks. Sonnet is the sweet spot where capability meets cost for the majority of real products, and it is where you should start unless you have a reason to move.

Escalate to Opus 4.8 (£3.95/$5 in, £19.75/$25 out) only for genuinely hard reasoning: complex multi-file engineering, deep research synthesis, intricate planning, and tasks where a wrong answer is expensive. A powerful pattern is to route dynamically, handle the easy 80% on Haiku or Sonnet and escalate the hard 20% to Opus, so you pay flagship rates only when the problem actually demands it. The Mythos-class tier sits above even that, for the rare frontier workloads that justify £7.90/$10 input.

This tiered routing, often called a model cascade, is the difference between an AI feature with healthy margins and one that quietly loses money. Build the cascade in from the start rather than defaulting every call to your most capable model.

Credits, API Keys & Billing

Getting onto the Claude API is straightforward. You create an account in the Claude Console, generate an API key, and start sending requests; new users receive a small amount of free credits to test before committing real money. Billing is postpaid on actual monthly usage, in US dollars, with credit-card or invoicing options for larger accounts, and live usage tracking in the Console so you can watch spend accrue in close to real time.

Throughput is governed by usage tiers rather than price. New accounts begin at Tier 1 with modest rate limits and climb through Tiers 2 to 4 as their spend and account history grow, unlocking higher requests-per-minute and tokens-per-minute ceilings at each step. Enterprise accounts can arrange custom limits. If you hit a wall on throughput, the fix is usually to move up a tier through usage or to contact sales, not to change your per-token price.

On discounts, three routes exist beyond the built-in caching and batch savings: volume discounts negotiated case by case for high-usage accounts, academic and research pricing, and bespoke enterprise terms with custom rate limits, dedicated support and committed-use arrangements. If you are spending meaningfully each month, it is worth a conversation with Anthropic sales, since committed volume can unlock rates below the public card. For teams on AWS, the Claude Platform on AWS route bills through the Marketplace in Claude Consumption Units, which can simplify procurement when you are already consolidating spend through Amazon.

Whichever route you choose, the discipline is the same: start on the public rates, instrument your usage, prove the value, and only then optimise the commercial terms. The technical levers, model choice, caching and batching, will almost always save you more than the contract negotiation, so get those right first.

Free Guide

Get the free guide: Claude vs ChatGPT, Gemini & Grok

A 20-page playbook covering everything you need to choose and use the big four AI models in 2026, full cost and feature comparisons, what each is best (and worst) at, and how-tos for images, vectors, building a website, Claude Code and more.

The Bottom Line

Claude API pricing is simple at the surface, per-token in USD, and deep underneath, where caching, batching, model choice and the free 1M context window decide whether a workload costs pennies or pounds. Memorise four numbers (Haiku £0.79/$1, Sonnet £2.37/$3, Opus £3.95/$5, Fable £7.90/$10 input per million, with output at 5x), then lean on caching and batch for everything repeatable.

Anthropic adjusts this sheet regularly as new models land and older ones retire, so always confirm against the official pricing page before you commit a budget. We keep this guide updated as the rates change.

Last updated: June 2026. Figures sourced from Anthropic's official Claude API pricing page and verified at the time of writing. Prices are billed in USD; £ values are approximate. Always confirm current rates at claude.com/pricing before budgeting.

Claude API Pricing: Every Model Rate Explained