AI Skill Report Card

Generating Test Data

B+78·Feb 13, 2026·Source: Web

Quick Start

Python
# Generate 100 realistic user records import faker fake = faker.Faker() users = [ { "id": i, "name": fake.name(), "email": fake.email(), "created_at": fake.date_time_between(start_date='-2y'), "country": fake.country_code() } for i in range(1, 101) ]

Workflow

  1. Analyze data requirements - Identify fields, types, constraints, relationships
  2. Choose generation strategy - Faker library, custom generators, or pattern-based
  3. Define realistic constraints - Date ranges, value distributions, foreign keys
  4. Generate and validate - Create data, check for realistic patterns
  5. Export in target format - JSON, CSV, SQL inserts, or API fixtures

Progress:

  • Map required data schema
  • Set up Faker providers or custom generators
  • Define realistic business rules and constraints
  • Generate sample dataset
  • Validate data quality and export

Examples

Example 1: Input: "Need 50 e-commerce orders with products, customers, and timestamps from last 6 months" Output:

Python
orders = [ { "order_id": f"ORD-{fake.random_number(digits=6)}", "customer_email": fake.email(), "items": [ {"product": fake.catch_phrase(), "price": fake.random_int(10, 200)} for _ in range(fake.random_int(1, 4)) ], "total": sum(item["price"] for item in items), "created_at": fake.date_time_between(start_date='-6m') } for _ in range(50) ]

Example 2: Input: "Mock API responses for user authentication flow" Output:

JSON
{ "login_success": {"token": "eyJ0eXAi...", "expires": "2024-01-15T10:30:00Z"}, "login_error": {"error": "invalid_credentials", "message": "Email or password incorrect"}, "profile": {"id": 1847, "name": "Sarah Chen", "role": "admin"} }

Best Practices

  • Use Faker library for consistent, locale-aware data generation
  • Maintain referential integrity (consistent user IDs across related tables)
  • Include edge cases: null values, empty strings, boundary conditions
  • Generate data in realistic distributions (80/20 rules, seasonal patterns)
  • Seed random generators for reproducible test datasets

Common Pitfalls

  • Creating unrealistic data patterns (all users born on same day)
  • Ignoring business constraints (negative prices, future birth dates)
  • Generating too much data for development environments
  • Using production data patterns that reveal real information
0
Grade B+AI Skill Framework
Scorecard
Criteria Breakdown
Quick Start
15/15
Workflow
13/15
Examples
17/20
Completeness
15/20
Format
15/15
Conciseness
13/15