dummy-dataset

Dummy Dataset Generation

Generate realistic dummy datasets for testing with customizable columns, constraints, and output formats (CSV, JSON, SQL, Python script). Creates executable scripts or direct data files for immediate use.

Use when: Creating test data, generating sample datasets, building realistic mock data for development, or populating test environments.

Arguments:

$PRODUCT: The product or system name

$DATASET_TYPE: Type of data (e.g., customer feedback, transactions, user profiles)

$ROWS: Number of rows to generate (default: 100)

$COLUMNS: Specific columns or fields to include

$FORMAT: Output format (CSV, JSON, SQL, Python script)

$CONSTRAINTS: Additional constraints or business rules

Step-by-Step Process

Identify dataset type - Understand the data domain

Define column specifications - Names, data types, and value ranges

Determine row count - How many sample records needed

Select output format - CSV, JSON, SQL INSERT, or Python script

Apply realistic patterns - Ensure data looks authentic and valid

Add business constraints - Respect business logic and relationships

Generate or script data - Create executable output

Validate output - Ensure data quality and completeness

Template: Python Script Output

import csv
import json
from datetime import datetime, timedelta
import random

# Configuration
ROWS = $ROWS
FILENAME = "$DATASET_TYPE.csv"

# Column definitions with realistic value generators
columns = {
    "id": "auto-increment",
    "name": "first_last_name",
    "email": "email",
    "created_at": "timestamp",
    # Add more columns...
}

def generate_dataset():
    """Generate realistic dummy dataset"""
    data = []
    for i in range(1, ROWS + 1):
        record = {
            "id": f"U{i:06d}",
            # Generate values based on column definitions
        }
        data.append(record)
    return data

def save_as_csv(data, filename):
    """Save dataset as CSV"""
    with open(filename, 'w', newline='') as f:
        writer = csv.DictWriter(f, fieldnames=data[0].keys())
        writer.writeheader()
        writer.writerows(data)

if __name__ == "__main__":
    dataset = generate_dataset()
    save_as_csv(dataset, FILENAME)
    print(f"Generated {len(dataset)} records in {FILENAME}")

Example Dataset Specification

Dataset Type: Customer Feedback

Columns:

feedback_id (auto-increment, U001, U002...)

customer_name (realistic names)

email (valid email format)

feedback_date (dates last 90 days)

rating (1-5 stars)

category (Bug, Feature Request, Complaint, Praise)

text (realistic feedback)

product (electronics, clothing, home)

Constraints:

Ratings skewed: 40% 5-star, 30% 4-star, 20% 3-star, 10% 1-2 star

Bug category only with ratings 1-3

Feature requests only with ratings 3-5

Email domains realistic (gmail, yahoo, company.com)

Output Deliverables

Ready-to-execute Python script OR direct data file

CSV file with proper headers and formatting

JSON file with valid structure and types

SQL INSERT statements for database population

Data validation and constraint compliance

Realistic, business-appropriate values

Documentation of data generation logic

Quick-start instructions for using the dataset

Output Formats

CSV: Flat tabular format, easy to import into spreadsheets and databases

JSON: Nested structure, ideal for APIs and NoSQL databases

SQL: INSERT statements, directly executable on relational databases

Python Script: Executable generator for custom or large datasets

作者

分类

安装