TokenFloor.AI Home

★ EUROPE'S EDGE

TokenFloor · Prompt-cache calculator

Prompt caching cuts the bill on repeated context. This page explains how it works per provider, then helps you figure out when it pays off for your workload. Pick a model, set the cacheable prompt size, and find the call count where the lines cross.

What is prompt caching?

When the same context appears in many requests — a long system prompt, RAG snippets, or a document being analysed — the provider can cache it server-side. The first request pays full price; later requests pay 10–50% of input price for the cached portion.

When it pays off

  • Large repeated context (system prompts ≥ 2K tokens, or RAG with a stable corpus)
  • High reuse rate — typically ≥3 requests within the cache TTL window
  • Latency-sensitive paths — cache hits also cut time-to-first-token roughly in half

When it doesn't

  • One-off Q&A (no context to reuse)
  • Small system prompts (<500 tokens — the write surcharge eats the saving)
  • Low traffic (< 1 request per 5 minutes for Anthropic; cache expires)

Provider quick reference

Provider Cached-input price Cache TTL How to activate
Anthropic (Claude 4.x) 10% of input 5 minutes (24h opt-in) Explicit cache_control markers in messages
OpenAI (GPT-5.x, o-series) 50% of input 5–60 minutes (automatic) Automatic for prompts ≥ 1024 tokens that share a prefix
Google (Gemini 3.x) 25% of input 1 hour default (configurable) Explicit context-caching API; or implicit caching for prefixes ≥ 4K tokens
DeepSeek (V4) 2% of input Hours (LRU eviction) Automatic on disk-cache hits, reported in prompt_cache_hit_tokens
xAI (Grok 4.3) 16% of input 5 minutes Automatic for repeated prefixes
Together / Kimi / Z.AI 10–17% of input Varies (provider-specific) See provider docs — coverage uneven across OSS hosting

Discounts above reflect the cached-input rate as a share of the standard input rate. Output tokens are always priced at the standard output rate — caching only reduces the prompt portion. Source: published provider pricing pages, refreshed weekly by the TokenFloor refresh bot.

Calculate your break-even