AI Skill Report Card
Implementing Prompt Caching
Quick Start
Bashcurl https://api.anthropic.com/v1/messages \ -H "content-type: application/json" \ -H "x-api-key: $ANTHROPIC_API_KEY" \ -H "anthropic-version: 2023-06-01" \ -d '{ "model": "claude-sonnet-4-5", "max_tokens": 1024, "system": [ { "type": "text", "text": "You are an AI assistant for analyzing documents.", "cache_control": {"type": "ephemeral"} } ], "messages": [ { "role": "user", "content": "Analyze this document..." } ] }'
Recommendation▾
Add concrete cost savings examples with specific token counts and dollar amounts to better demonstrate ROI
Workflow
Progress:
- Structure prompt with static content first (tools, system, context)
- Add cache_control to the last block of reusable content
- Place variable content after cache breakpoint
- Monitor cache performance via response fields
- Adjust breakpoints based on usage patterns
Cache Hierarchy Order
- Tools - Function definitions, schemas
- System - Instructions, context, examples
- Messages - Conversation history
Changes at any level invalidate that level and all subsequent levels.
Recommendation▾
Include a troubleshooting section with specific error messages and solutions for common cache issues
Examples
Example 1: Document Analysis Input:
JSON{ "system": [ {"type": "text", "text": "<large document content>", "cache_control": {"type": "ephemeral"}} ], "messages": [ {"role": "user", "content": "What are the main themes?"} ] }
Output: First call creates cache, subsequent calls with same document hit cache and process only the new question.
Example 2: Coding Assistant with Tools Input:
JSON{ "tools": [...tool_definitions...], "system": [ {"type": "text", "text": "You are a coding assistant.", "cache_control": {"type": "ephemeral"}} ] }
Output: Tools and system instructions cached, only new code queries processed.
Recommendation▾
Provide a decision matrix or flowchart for when to use caching vs. when overhead isn't worth it
Best Practices
Content Placement:
- Static content first: system instructions, context, examples
- Variable content last: user queries, dynamic data
- Set cache breakpoint at end of static content
Multiple Breakpoints (max 4):
- Use when content changes at different frequencies
- Place before editable sections
- Helps with 20-block lookback limitation
Minimum Cache Sizes:
- Claude Opus 4.5: 4,096 tokens
- Claude Sonnet 4.5/4: 1,024 tokens
- Claude Haiku 4.5: 4,096 tokens
- Claude Haiku 3.5: 2,048 tokens
Performance Monitoring:
JSON{ "cache_creation_input_tokens": 50000, "cache_read_input_tokens": 0, "input_tokens": 20 }
Common Pitfalls
Cache Invalidation:
- Don't modify cached content unnecessarily
- Web search/citations toggles invalidate system cache
- Tool choice changes invalidate message cache
Concurrent Requests:
- Cache only available after first response begins
- Wait for first response before sending parallel requests
Content Limitations:
- Cannot cache thinking blocks directly
- Cannot cache sub-content blocks (cache parent instead)
- Cannot cache empty text blocks
Pricing Structure:
- 5-minute cache writes: 1.25x base price
- 1-hour cache writes: 2x base price
- Cache reads: 0.1x base price
- Only pay for what's actually cached/read
Lookback Window:
- System checks only 20 blocks backward from breakpoint
- Set explicit breakpoints for content beyond 20 blocks
- Place breakpoints before frequently edited content