AI Skill Report Card

Generated Skill

B-70·Feb 5, 2026

Quick Start

Bash
curl https://api.anthropic.com/v1/messages \ -H "content-type: application/json" \ -H "x-api-key: $ANTHROPIC_API_KEY" \ -H "anthropic-version: 2023-06-01" \ -d '{ "model": "claude-sonnet-4-5", "max_tokens": 1024, "system": [ { "type": "text", "text": "You are a helpful assistant with access to the company handbook.", "cache_control": {"type": "ephemeral"} } ], "messages": [ { "role": "user", "content": "What is the vacation policy?" } ] }'

Workflow

  1. Identify cacheable content: Static instructions, large documents, tool definitions, examples
  2. Structure prompt hierarchy: tools → system → messages (in that order)
  3. Place cache breakpoints: Add cache_control: {"type": "ephemeral"} to last block of cacheable content
  4. Monitor performance: Check cache_read_input_tokens and cache_creation_input_tokens in responses
  5. Optimize placement: Adjust breakpoints based on usage patterns

Cache Strategy Checklist:

  • Minimum tokens met (1024+ for Sonnet/Opus, 4096+ for Opus 4.5/Haiku 4.5)
  • Static content placed at beginning
  • Cache breakpoint at end of conversation
  • Additional breakpoints before frequently edited sections
  • Performance monitoring implemented

Examples

Example 1: Document Analysis

JSON
{ "system": [ {"type": "text", "text": "You analyze documents for key insights."}, {"type": "text", "text": "<entire-document-content>", "cache_control": {"type": "ephemeral"}} ], "messages": [ {"role": "user", "content": "Summarize the main points"} ] }

Example 2: Multi-tool Setup

JSON
{ "tools": [ {"type": "function", "function": {...}, "cache_control": {"type": "ephemeral"}} ], "system": [ {"type": "text", "text": "Use tools to help users with calculations."} ], "messages": [ {"role": "user", "content": "Calculate compound interest for $1000 at 5% for 10 years"} ] }

Response tracking:

JSON
{ "usage": { "cache_creation_input_tokens": 15000, "cache_read_input_tokens": 0, "input_tokens": 25, "output_tokens": 150 } }

Best Practices

  • Put static content first: system instructions, context, examples, tool definitions
  • Place variable content last: user messages, dynamic data
  • Use up to 4 cache breakpoints for different change frequencies
  • Always place breakpoint at conversation end
  • Add breakpoints before content that changes frequently
  • Consider 20-block lookback window for automatic prefix checking
  • 5-minute cache: 1.25x write cost, 0.1x read cost
  • 1-hour cache: 2x write cost, 0.1x read cost (premium feature)
  • Cache writes only charged once, reads are 90% cheaper
JavaScript
// Calculate total input tokens const totalInput = cache_read + cache_creation + input_tokens; // Track cache hit rate const cacheHitRate = cache_read / (cache_read + cache_creation + input_tokens);

Common Pitfalls

  • Tool changes: Modifying tool definitions invalidates entire cache
  • System modifications: Web search/citations toggles invalidate system + messages
  • Image changes: Adding/removing images invalidates message cache
  • Hierarchy violations: Changes invalidate current level + all subsequent levels
  • Too small: Sub-minimum token counts aren't cached (varies by model)
  • Concurrent requests: Cache only available after first response begins
  • Empty blocks: Cannot cache empty text blocks
  • Wrong order: Cache breakpoints must follow tools → system → messages hierarchy
  • Thinking blocks: Cannot cache directly (but can be included in cached assistant turns)
  • Sub-content: Cache parent blocks, not citations or sub-elements
  • Misreading metrics: input_tokens only shows post-breakpoint tokens, not total input
  • Ignoring invalidation: Not tracking what changes break cache efficiency
  • Poor breakpoint strategy: Not considering edit patterns when placing breakpoints
0
Grade B-AI Skill Framework
Scorecard
Criteria Breakdown
Quick Start
11/15
Workflow
11/15
Examples
15/20
Completeness
15/20
Format
11/15
Conciseness
11/15