Claude’s prompt caching: what it saves and when to turn it on

WordPress plugin settings page for Claude API key configuration

Written by

in

Claude introduced prompt caching in late 2024, and most solo operators still don’t use it — even when they’re burning through API credits on repetitive tasks.

The feature lets you cache large chunks of context (style guides, product catalogs, documentation) so Claude doesn’t re-read them on every request. When it works, it cuts costs by 90% and speeds up responses. When it doesn’t, you pay a caching penalty for no benefit.

Here’s how to know which side you’re on.

How prompt caching actually works

Every time you send a prompt to Claude, the API charges you for input tokens (what you send) and output tokens (what Claude generates). With caching enabled, Claude stores the first part of your prompt — the part that doesn’t change between requests — and reuses it for up to five minutes.

Cached input tokens cost 90% less than regular input tokens. But there’s a catch: the cached section must be at least 1,024 tokens, and it has to appear at the start of your prompt. If your repeated context is buried mid-prompt, caching won’t trigger.

Most operators structure their prompts backwards. They put the variable part (the user question, the draft to edit, the product name) first, then append the static instructions. That ordering breaks caching.

To make caching work, flip it: static context first, variable input last.

When caching saves real money

Caching pays off when you’re running the same large prompt dozens or hundreds of times per day. Three scenarios where it matters:

Batch content editing. You’re rewriting 50 product descriptions using the same brand voice guide (3,000 tokens). Without caching, you pay full price for that guide on every request. With caching, you pay once, then 10% for the next 49.

Structured data extraction. You’re parsing invoices, receipts, or support tickets into JSON using the same schema definition (2,000 tokens). Each parse job reuses the schema. Cache it.

Context-heavy chat interfaces. You’re building a support bot that references your entire help center (10,000 tokens) on every question. Cache the help center, send only the user’s question as new input.

If you’re running fewer than 10 requests per day with the same context, caching won’t move the needle. The setup overhead isn’t worth it.

How to structure prompts for caching

Here’s the wrong way (no caching):

User question: [variable input]
Instructions: [3,000-token style guide]

Here’s the right way (caching triggers):

Instructions: [3,000-token style guide]
User question: [variable input]

In the API request, you mark the instructions block as cache_control: {"type": "ephemeral"}. Claude caches everything up to that marker. On the next request, if the cached section is identical, you pay the reduced rate.

One non-obvious detail: the cache expires after five minutes of inactivity. If your workflow runs requests in bursts with long gaps, you’ll pay the caching write cost repeatedly without ever hitting the cache. Caching works best for sustained, high-frequency use — not sporadic jobs.

When caching costs more than it saves

Caching isn’t free. The first time Claude writes to the cache, you pay a 25% premium on those tokens. If you send a 5,000-token prompt once and never reuse it, you’ve paid extra for nothing.

You also lose caching benefits if you tweak the cached section between requests. Changing even one word in your style guide invalidates the cache and triggers a new write. If you’re still iterating on your prompt structure, wait until it’s stable before enabling caching.

And if your repeated context is small (under 1,024 tokens), caching won’t activate at all. The feature is designed for large, static blocks — not short instructions.

What this means for your workflow

Most solo operators should ignore caching until they hit a clear threshold: same large prompt, 20+ times per day, stable structure. Below that, the cost savings are negligible and the cognitive overhead of restructuring prompts isn’t worth it.

But if you’re running batch jobs, building repeatable AI workflows, or prototyping a product that calls Claude hundreds of times, caching can cut your API bill in half. Just don’t bolt it onto your existing prompts without restructuring them first.

Using Claude for high-volume workflows? Subscribe to One Two Three Send for more breakdowns of AI features that actually matter to solo operators.

Heads up — some links in this article are affiliate links. If you sign up through them, we may earn a small commission at no extra cost to you. We only recommend tools we use ourselves.

Other newsletters you might like

My Local Dublin

Dublin Ireland – Explore the city and find things to do, places to see and food to eat.

Subscribe

Love Netherlands

Canal towns, hidden villages, Dutch stories — a slow, loving look at the Netherlands, written by the people who love it most.

Subscribe

Love South Africa

South Africa as a travel destination. The Rainbow nation full of wonderful gems to visit. Going on Safari in the Kruger National Park, visiting the beautiful beaches of Cape Town, indulge in the South African culture and heritage.

Subscribe

Love London

A newsletter for Londoners who want to rediscover their own city. Travellers planning their first or fifth visit. Anglophiles who fell in love with London through literature, film, or a rainy afternoon on the South Bank.

Subscribe

Newsletters via the One Two Three Send network.  ·  Want your newsletter featured here? Click here