★ EUROPE'S SOVEREIGNTY

TokenFloor · Prompt-cache calculator

Prompt caching cuts the bill on repeated context. This page explains how it works per provider, then helps you figure out when it pays off for your workload. Pick a model, set the cacheable prompt size, and find the call count where the lines cross.

What is prompt caching?

When the same context appears in many requests — a long system prompt, RAG snippets, or a document being analysed — the provider can cache it server-side. The first request pays full price; later requests pay 10–50% of input price for the cached portion.

When it pays off

Large repeated context (system prompts ≥ 2K tokens, or RAG with a stable corpus)
High reuse rate — typically ≥3 requests within the cache TTL window
Latency-sensitive paths — cache hits also cut time-to-first-token roughly in half

When it doesn't

One-off Q&A (no context to reuse)
Small system prompts (<500 tokens — the write surcharge eats the saving)
Low traffic (< 1 request per 5 minutes for Anthropic; cache expires)

Provider quick reference

Provider	Cached-input price	Cache TTL	How to activate
Anthropic (Claude 4.x)	10% of input	5 minutes (24h opt-in)	Explicit `cache_control` markers in messages
OpenAI (GPT-5.x, o-series)	50% of input	5–60 minutes (automatic)	Automatic for prompts ≥ 1024 tokens that share a prefix
Google (Gemini 3.x)	25% of input	1 hour default (configurable)	Explicit context-caching API; or implicit caching for prefixes ≥ 4K tokens
DeepSeek (V4)	2% of input	Hours (LRU eviction)	Automatic on disk-cache hits, reported in `prompt_cache_hit_tokens`
xAI (Grok 4.3)	16% of input	5 minutes	Automatic for repeated prefixes
Together / Kimi / Z.AI	10–17% of input	Varies (provider-specific)	See provider docs — coverage uneven across OSS hosting

Discounts above reflect the cached-input rate as a share of the standard input rate. Output tokens are always priced at the standard output rate — caching only reduces the prompt portion. Source: published provider pricing pages, refreshed weekly by the TokenFloor refresh bot.

Calculate your break-even