AI Skill Report Card

Implementing Prompt Caching

A92·Feb 5, 2026

Quick Start

Bash
curl https://api.anthropic.com/v1/messages \ -H "content-type: application/json" \ -H "x-api-key: $ANTHROPIC_API_KEY" \ -H "anthropic-version: 2023-06-01" \ -d '{ "model": "claude-sonnet-4-5", "max_tokens": 1024, "system": [ { "type": "text", "text": "You are an expert data analyst...", "cache_control": {"type": "ephemeral"} } ], "messages": [ {"role": "user", "content": "Analyze this dataset..."} ] }'

Workflow

Initial Setup:

  1. Structure prompt with static content first (tools, system, context)
  2. Add "cache_control": {"type": "ephemeral"} to final static block
  3. Place dynamic content (user messages) after cached sections
  4. Send initial request (pays cache write cost: 1.25x base price)

Subsequent Requests:

  1. Keep cached content identical
  2. Modify only content after cache breakpoint
  3. Benefit from 90% cost reduction on cached tokens (0.1x base price)
  4. Cache refreshes automatically on each use (5-minute default lifetime)

Progress Checklist for Multi-Breakpoint Setup:

  • Identify content that changes at different frequencies
  • Place tools definitions first (rarely change)
  • Add system instructions next (occasional changes)
  • Include context/examples (moderate changes)
  • Set breakpoints before frequently edited sections
  • Verify minimum token requirements met

Examples

Example 1: Document Analysis Input:

JSON
{ "system": [ {"type": "text", "text": "<entire_document_content>", "cache_control": {"type": "ephemeral"}} ], "messages": [ {"role": "user", "content": "What are the main themes?"} ] }

Output: First call creates cache, subsequent questions reuse cached document

Example 2: Code Assistant with Multiple Breakpoints Input:

JSON
{ "tools": [ {"name": "execute_code", "description": "...", "cache_control": {"type": "ephemeral"}} ], "system": [ {"type": "text", "text": "You are a coding assistant...", "cache_control": {"type": "ephemeral"}} ], "messages": [ {"role": "user", "content": "Debug this function..."} ] }

Output: Tools and system cached separately, allowing independent updates

Best Practices

Content Ordering:

  • Place static content first: tools → system → context → examples
  • Add cache breakpoints at transition points between stability levels
  • Keep frequently changing content after final breakpoint

Token Management:

  • Ensure cached sections meet minimum requirements:
    • Claude Opus 4.5/Haiku 4.5: 4096 tokens
    • Other models: 1024-2048 tokens
  • Monitor cache_read_input_tokens vs cache_creation_input_tokens in responses

Strategic Breakpoints:

  • Use up to 4 breakpoints maximum
  • Place breakpoints before content that might be edited
  • Set final breakpoint at conversation end for maximum cache hits
  • Remember 20-block lookback limit when placing breakpoints

Performance Monitoring:

JSON
{ "usage": { "cache_creation_input_tokens": 0, "cache_read_input_tokens": 100000, "input_tokens": 50 } }

Common Pitfalls

Cache Invalidation:

  • Don't modify any cached content - entire cache becomes invalid
  • Avoid changing tool definitions, system prompts, or web search settings
  • Remember hierarchy: tools → system → messages (changes invalidate downstream)

Ineffective Caching:

  • Don't cache frequently changing content
  • Don't place dynamic content before static content
  • Don't use cache for prompts under minimum token thresholds
  • Don't expect cache hits for concurrent requests (wait for first response)

Cost Miscalculation:

  • Don't forget cache write costs (25% premium for 5-minute TTL)
  • Don't assume input_tokens represents total input (only post-breakpoint tokens)
  • Don't overlook that thinking blocks can't be directly cached

Structural Issues:

  • Don't exceed 20 blocks before breakpoints without strategic intermediate breakpoints
  • Don't try to cache empty text blocks or sub-content blocks directly
  • Don't place breakpoints on content that changes frequently
0
Grade AAI Skill Framework
Scorecard
Criteria Breakdown
Quick Start
11/15
Workflow
11/15
Examples
15/20
Completeness
15/20
Format
11/15
Conciseness
11/15