AI Skill Report Card

Generating Test Data

B+78·Feb 12, 2026·Source: Web
YAML
--- name: generating-test-data description: Generates realistic placeholder data for development and testing. Use when you need sample datasets, mock API responses, or test fixtures. --- # Quick Start ```python from faker import Faker import random fake = Faker() # Generate test users users = [ { "id": i, "name": fake.name(), "email": fake.email(), "created_at": fake.date_this_year().isoformat() } for i in range(1, 6) ]

Workflow

  1. Identify data schema - Define fields, types, and relationships needed
  2. Choose generation method - Faker library, manual patterns, or real data sampling
  3. Generate base dataset - Create core records with realistic values
  4. Add variations and edge cases - Include nulls, extremes, and special scenarios
  5. Export in target format - CSV, JSON, SQL, XML as needed

Progress:

  • Map required fields and data types
  • Install generation tools (faker, mimesis)
  • Create base data generation script
  • Add edge cases and variants
  • Export to required format

Examples

Example 1: Input: E-commerce product catalog (50 items) Output:

Python
products = [ { "sku": f"PROD-{1000+i}", "name": fake.catch_phrase(), "price": round(random.uniform(9.99, 299.99), 2), "category": random.choice(["Electronics", "Clothing", "Home"]), "in_stock": random.choice([True, False]) } for i in range(50) ]

Example 2: Input: CSV for user testing (100 rows) Output:

Python
import csv with open('test_users.csv', 'w', newline='') as f: writer = csv.DictWriter(f, fieldnames=['id', 'name', 'age', 'city']) writer.writeheader() for i in range(100): writer.writerow({ 'id': i+1, 'name': fake.name(), 'age': random.randint(18, 80), 'city': fake.city() })

Example 3: Input: API response with nested data Output:

Python
api_response = { "status": "success", "data": { "orders": [ { "order_id": fake.uuid4(), "customer": fake.name(), "items": [ {"product": fake.word(), "qty": random.randint(1, 5)} for _ in range(random.randint(1, 4)) ], "total": round(random.uniform(25.00, 500.00), 2) } for _ in range(10) ] } }

Best Practices

  • Use Faker library for realistic personal data (names, addresses, emails)
  • Use mimesis for performance-critical large datasets
  • Seed random generators for reproducible test data: fake.seed_instance(42)
  • Create data templates for common schemas (users, products, transactions)
  • Include realistic distributions (80/20 rule, bell curves for ages/prices)
  • Add intentional edge cases: empty strings, max lengths, special characters

Common Pitfalls

  • Using obviously fake data like "Test User 1" that breaks realistic testing
  • Forgetting to handle foreign key relationships in relational data
  • Creating datasets too small to reveal performance issues
  • Using production data patterns that leak sensitive information
  • Not including sufficient data variety to catch edge case bugs
0
Grade B+AI Skill Framework
Scorecard
Criteria Breakdown
Quick Start
11/15
Workflow
11/15
Examples
15/20
Completeness
15/20
Format
11/15
Conciseness
11/15