AI Skill Report Card

Implementing Prompt Caching

A-85·Feb 5, 2026
Bash
curl https://api.anthropic.com/v1/messages \ -H "content-type: application/json" \ -H "x-api-key: $ANTHROPIC_API_KEY" \ -H "anthropic-version: 2023-06-01" \ -d '{ "model": "claude-sonnet-4-5", "max_tokens": 1024, "system": [ { "type": "text", "text": "You are an AI assistant for analyzing documents.", "cache_control": {"type": "ephemeral"} } ], "messages": [ { "role": "user", "content": "Analyze this document..." } ] }'
Recommendation
Add concrete cost savings examples with specific token counts and dollar amounts to better demonstrate ROI

Progress:

  • Structure prompt with static content first (tools, system, context)
  • Add cache_control to the last block of reusable content
  • Place variable content after cache breakpoint
  • Monitor cache performance via response fields
  • Adjust breakpoints based on usage patterns

Cache Hierarchy Order

  1. Tools - Function definitions, schemas
  2. System - Instructions, context, examples
  3. Messages - Conversation history

Changes at any level invalidate that level and all subsequent levels.

Recommendation
Include a troubleshooting section with specific error messages and solutions for common cache issues

Example 1: Document Analysis Input:

JSON
{ "system": [ {"type": "text", "text": "<large document content>", "cache_control": {"type": "ephemeral"}} ], "messages": [ {"role": "user", "content": "What are the main themes?"} ] }

Output: First call creates cache, subsequent calls with same document hit cache and process only the new question.

Example 2: Coding Assistant with Tools Input:

JSON
{ "tools": [...tool_definitions...], "system": [ {"type": "text", "text": "You are a coding assistant.", "cache_control": {"type": "ephemeral"}} ] }

Output: Tools and system instructions cached, only new code queries processed.

Recommendation
Provide a decision matrix or flowchart for when to use caching vs. when overhead isn't worth it

Content Placement:

  • Static content first: system instructions, context, examples
  • Variable content last: user queries, dynamic data
  • Set cache breakpoint at end of static content

Multiple Breakpoints (max 4):

  • Use when content changes at different frequencies
  • Place before editable sections
  • Helps with 20-block lookback limitation

Minimum Cache Sizes:

  • Claude Opus 4.5: 4,096 tokens
  • Claude Sonnet 4.5/4: 1,024 tokens
  • Claude Haiku 4.5: 4,096 tokens
  • Claude Haiku 3.5: 2,048 tokens

Performance Monitoring:

JSON
{ "cache_creation_input_tokens": 50000, "cache_read_input_tokens": 0, "input_tokens": 20 }

Cache Invalidation:

  • Don't modify cached content unnecessarily
  • Web search/citations toggles invalidate system cache
  • Tool choice changes invalidate message cache

Concurrent Requests:

  • Cache only available after first response begins
  • Wait for first response before sending parallel requests

Content Limitations:

  • Cannot cache thinking blocks directly
  • Cannot cache sub-content blocks (cache parent instead)
  • Cannot cache empty text blocks

Pricing Structure:

  • 5-minute cache writes: 1.25x base price
  • 1-hour cache writes: 2x base price
  • Cache reads: 0.1x base price
  • Only pay for what's actually cached/read

Lookback Window:

  • System checks only 20 blocks backward from breakpoint
  • Set explicit breakpoints for content beyond 20 blocks
  • Place breakpoints before frequently edited content
0
Grade A-AI Skill Framework
Scorecard
Criteria Breakdown
Quick Start
11/15
Workflow
11/15
Examples
15/20
Completeness
15/20
Format
11/15
Conciseness
11/15