★ EUROPE'S EDGE
TokenFloor · Prompt-cache calculator
Prompt caching cuts the bill on repeated context. This page explains how it works per provider, then helps you figure out when it pays off for your workload. Pick a model, set the cacheable prompt size, and find the call count where the lines cross.
What is prompt caching?
When the same context appears in many requests — a long system prompt, RAG snippets, or a document being analysed — the provider can cache it server-side. The first request pays full price; later requests pay 10–50% of input price for the cached portion.
When it pays off
- Large repeated context (system prompts ≥ 2K tokens, or RAG with a stable corpus)
- High reuse rate — typically ≥3 requests within the cache TTL window
- Latency-sensitive paths — cache hits also cut time-to-first-token roughly in half
When it doesn't
- One-off Q&A (no context to reuse)
- Small system prompts (<500 tokens — the write surcharge eats the saving)
- Low traffic (< 1 request per 5 minutes for Anthropic; cache expires)
Provider quick reference
| Provider | Cached-input price | Cache TTL | How to activate |
|---|---|---|---|
| Anthropic (Claude 4.x) | 10% of input | 5 minutes (24h opt-in) | Explicit cache_control markers in messages |
| OpenAI (GPT-5.x, o-series) | 50% of input | 5–60 minutes (automatic) | Automatic for prompts ≥ 1024 tokens that share a prefix |
| Google (Gemini 3.x) | 25% of input | 1 hour default (configurable) | Explicit context-caching API; or implicit caching for prefixes ≥ 4K tokens |
| DeepSeek (V4) | 2% of input | Hours (LRU eviction) | Automatic on disk-cache hits, reported in prompt_cache_hit_tokens |
| xAI (Grok 4.3) | 16% of input | 5 minutes | Automatic for repeated prefixes |
| Together / Kimi / Z.AI | 10–17% of input | Varies (provider-specific) | See provider docs — coverage uneven across OSS hosting |
Discounts above reflect the cached-input rate as a share of the standard input rate. Output tokens are always priced at the standard output rate — caching only reduces the prompt portion. Source: published provider pricing pages, refreshed weekly by the TokenFloor refresh bot.
Calculate your break-even