AI Skill Report Card
Implementing Prompt Caching
Quick Start
Bashcurl https://api.anthropic.com/v1/messages \ -H "content-type: application/json" \ -H "x-api-key: $ANTHROPIC_API_KEY" \ -H "anthropic-version: 2023-06-01" \ -d '{ "model": "claude-sonnet-4-5", "max_tokens": 1024, "system": [ { "type": "text", "text": "You are an expert data analyst...", "cache_control": {"type": "ephemeral"} } ], "messages": [ {"role": "user", "content": "Analyze this dataset..."} ] }'
Workflow
Initial Setup:
- Structure prompt with static content first (tools, system, context)
- Add
"cache_control": {"type": "ephemeral"}to final static block - Place dynamic content (user messages) after cached sections
- Send initial request (pays cache write cost: 1.25x base price)
Subsequent Requests:
- Keep cached content identical
- Modify only content after cache breakpoint
- Benefit from 90% cost reduction on cached tokens (0.1x base price)
- Cache refreshes automatically on each use (5-minute default lifetime)
Progress Checklist for Multi-Breakpoint Setup:
- Identify content that changes at different frequencies
- Place tools definitions first (rarely change)
- Add system instructions next (occasional changes)
- Include context/examples (moderate changes)
- Set breakpoints before frequently edited sections
- Verify minimum token requirements met
Examples
Example 1: Document Analysis Input:
JSON{ "system": [ {"type": "text", "text": "<entire_document_content>", "cache_control": {"type": "ephemeral"}} ], "messages": [ {"role": "user", "content": "What are the main themes?"} ] }
Output: First call creates cache, subsequent questions reuse cached document
Example 2: Code Assistant with Multiple Breakpoints Input:
JSON{ "tools": [ {"name": "execute_code", "description": "...", "cache_control": {"type": "ephemeral"}} ], "system": [ {"type": "text", "text": "You are a coding assistant...", "cache_control": {"type": "ephemeral"}} ], "messages": [ {"role": "user", "content": "Debug this function..."} ] }
Output: Tools and system cached separately, allowing independent updates
Best Practices
Content Ordering:
- Place static content first: tools → system → context → examples
- Add cache breakpoints at transition points between stability levels
- Keep frequently changing content after final breakpoint
Token Management:
- Ensure cached sections meet minimum requirements:
- Claude Opus 4.5/Haiku 4.5: 4096 tokens
- Other models: 1024-2048 tokens
- Monitor
cache_read_input_tokensvscache_creation_input_tokensin responses
Strategic Breakpoints:
- Use up to 4 breakpoints maximum
- Place breakpoints before content that might be edited
- Set final breakpoint at conversation end for maximum cache hits
- Remember 20-block lookback limit when placing breakpoints
Performance Monitoring:
JSON{ "usage": { "cache_creation_input_tokens": 0, "cache_read_input_tokens": 100000, "input_tokens": 50 } }
Common Pitfalls
Cache Invalidation:
- Don't modify any cached content - entire cache becomes invalid
- Avoid changing tool definitions, system prompts, or web search settings
- Remember hierarchy: tools → system → messages (changes invalidate downstream)
Ineffective Caching:
- Don't cache frequently changing content
- Don't place dynamic content before static content
- Don't use cache for prompts under minimum token thresholds
- Don't expect cache hits for concurrent requests (wait for first response)
Cost Miscalculation:
- Don't forget cache write costs (25% premium for 5-minute TTL)
- Don't assume
input_tokensrepresents total input (only post-breakpoint tokens) - Don't overlook that thinking blocks can't be directly cached
Structural Issues:
- Don't exceed 20 blocks before breakpoints without strategic intermediate breakpoints
- Don't try to cache empty text blocks or sub-content blocks directly
- Don't place breakpoints on content that changes frequently