AI Skill Report Card
Generating Test Data
Quick Start
Python# Generate 100 realistic user records import faker fake = faker.Faker() users = [ { "id": i, "name": fake.name(), "email": fake.email(), "created_at": fake.date_time_between(start_date='-2y'), "country": fake.country_code() } for i in range(1, 101) ]
Workflow
- Analyze data requirements - Identify fields, types, constraints, relationships
- Choose generation strategy - Faker library, custom generators, or pattern-based
- Define realistic constraints - Date ranges, value distributions, foreign keys
- Generate and validate - Create data, check for realistic patterns
- Export in target format - JSON, CSV, SQL inserts, or API fixtures
Progress:
- Map required data schema
- Set up Faker providers or custom generators
- Define realistic business rules and constraints
- Generate sample dataset
- Validate data quality and export
Examples
Example 1: Input: "Need 50 e-commerce orders with products, customers, and timestamps from last 6 months" Output:
Pythonorders = [ { "order_id": f"ORD-{fake.random_number(digits=6)}", "customer_email": fake.email(), "items": [ {"product": fake.catch_phrase(), "price": fake.random_int(10, 200)} for _ in range(fake.random_int(1, 4)) ], "total": sum(item["price"] for item in items), "created_at": fake.date_time_between(start_date='-6m') } for _ in range(50) ]
Example 2: Input: "Mock API responses for user authentication flow" Output:
JSON{ "login_success": {"token": "eyJ0eXAi...", "expires": "2024-01-15T10:30:00Z"}, "login_error": {"error": "invalid_credentials", "message": "Email or password incorrect"}, "profile": {"id": 1847, "name": "Sarah Chen", "role": "admin"} }
Best Practices
- Use Faker library for consistent, locale-aware data generation
- Maintain referential integrity (consistent user IDs across related tables)
- Include edge cases: null values, empty strings, boundary conditions
- Generate data in realistic distributions (80/20 rules, seasonal patterns)
- Seed random generators for reproducible test datasets
Common Pitfalls
- Creating unrealistic data patterns (all users born on same day)
- Ignoring business constraints (negative prices, future birth dates)
- Generating too much data for development environments
- Using production data patterns that reveal real information