AI Skill Report Card
Generated Skill
Quick Start
Bashcurl https://api.anthropic.com/v1/messages \ -H "content-type: application/json" \ -H "x-api-key: $ANTHROPIC_API_KEY" \ -H "anthropic-version: 2023-06-01" \ -d '{ "model": "claude-sonnet-4-5", "max_tokens": 1024, "system": [ { "type": "text", "text": "You are a helpful assistant with access to the company handbook.", "cache_control": {"type": "ephemeral"} } ], "messages": [ { "role": "user", "content": "What is the vacation policy?" } ] }'
Workflow
- Identify cacheable content: Static instructions, large documents, tool definitions, examples
- Structure prompt hierarchy: tools → system → messages (in that order)
- Place cache breakpoints: Add
cache_control: {"type": "ephemeral"}to last block of cacheable content - Monitor performance: Check
cache_read_input_tokensandcache_creation_input_tokensin responses - Optimize placement: Adjust breakpoints based on usage patterns
Cache Strategy Checklist:
- Minimum tokens met (1024+ for Sonnet/Opus, 4096+ for Opus 4.5/Haiku 4.5)
- Static content placed at beginning
- Cache breakpoint at end of conversation
- Additional breakpoints before frequently edited sections
- Performance monitoring implemented
Examples
Example 1: Document Analysis
JSON{ "system": [ {"type": "text", "text": "You analyze documents for key insights."}, {"type": "text", "text": "<entire-document-content>", "cache_control": {"type": "ephemeral"}} ], "messages": [ {"role": "user", "content": "Summarize the main points"} ] }
Example 2: Multi-tool Setup
JSON{ "tools": [ {"type": "function", "function": {...}, "cache_control": {"type": "ephemeral"}} ], "system": [ {"type": "text", "text": "Use tools to help users with calculations."} ], "messages": [ {"role": "user", "content": "Calculate compound interest for $1000 at 5% for 10 years"} ] }
Response tracking:
JSON{ "usage": { "cache_creation_input_tokens": 15000, "cache_read_input_tokens": 0, "input_tokens": 25, "output_tokens": 150 } }
Best Practices
Content Organization
- Put static content first: system instructions, context, examples, tool definitions
- Place variable content last: user messages, dynamic data
- Use up to 4 cache breakpoints for different change frequencies
Strategic Breakpoint Placement
- Always place breakpoint at conversation end
- Add breakpoints before content that changes frequently
- Consider 20-block lookback window for automatic prefix checking
Cost Optimization
- 5-minute cache: 1.25x write cost, 0.1x read cost
- 1-hour cache: 2x write cost, 0.1x read cost (premium feature)
- Cache writes only charged once, reads are 90% cheaper
Performance Monitoring
JavaScript// Calculate total input tokens const totalInput = cache_read + cache_creation + input_tokens; // Track cache hit rate const cacheHitRate = cache_read / (cache_read + cache_creation + input_tokens);
Common Pitfalls
Cache Invalidation
- Tool changes: Modifying tool definitions invalidates entire cache
- System modifications: Web search/citations toggles invalidate system + messages
- Image changes: Adding/removing images invalidates message cache
- Hierarchy violations: Changes invalidate current level + all subsequent levels
Size Limitations
- Too small: Sub-minimum token counts aren't cached (varies by model)
- Concurrent requests: Cache only available after first response begins
- Empty blocks: Cannot cache empty text blocks
Structural Issues
- Wrong order: Cache breakpoints must follow tools → system → messages hierarchy
- Thinking blocks: Cannot cache directly (but can be included in cached assistant turns)
- Sub-content: Cache parent blocks, not citations or sub-elements
Monitoring Mistakes
- Misreading metrics:
input_tokensonly shows post-breakpoint tokens, not total input - Ignoring invalidation: Not tracking what changes break cache efficiency
- Poor breakpoint strategy: Not considering edit patterns when placing breakpoints