dummy-dataset

生成逼真的虚拟数据集用于测试,支持自定义列、约束和输出格式(CSV、JSON、SQL、Python 脚本)。在创建测试数据、构建模拟数据集或为开发与演示生成示例数据时使用。

作者

分类

产品

安装

热度:7

下载并解压到你的 skills 目录

复制命令,发送给 OpenClaw 自动安装:

下载并安装这个技能 https://openskills.cc/api/download?slug=phuryn-pm-execution-skills-dummy-dataset&locale=zh&source=copy
name:dummy-datasetdescription:"Generate realistic dummy datasets for testing with customizable columns, constraints, and output formats (CSV, JSON, SQL, Python script). Use when creating test data, building mock datasets, or generating sample data for development and demos."

Dummy Dataset Generation

Generate realistic dummy datasets for testing with customizable columns, constraints, and output formats (CSV, JSON, SQL, Python script). Creates executable scripts or direct data files for immediate use.

Use when: Creating test data, generating sample datasets, building realistic mock data for development, or populating test environments.

Arguments:

  • $PRODUCT: The product or system name

  • $DATASET_TYPE: Type of data (e.g., customer feedback, transactions, user profiles)

  • $ROWS: Number of rows to generate (default: 100)

  • $COLUMNS: Specific columns or fields to include

  • $FORMAT: Output format (CSV, JSON, SQL, Python script)

  • $CONSTRAINTS: Additional constraints or business rules
  • Step-by-Step Process

  • Identify dataset type - Understand the data domain

  • Define column specifications - Names, data types, and value ranges

  • Determine row count - How many sample records needed

  • Select output format - CSV, JSON, SQL INSERT, or Python script

  • Apply realistic patterns - Ensure data looks authentic and valid

  • Add business constraints - Respect business logic and relationships

  • Generate or script data - Create executable output

  • Validate output - Ensure data quality and completeness
  • Template: Python Script Output

    import csv
    import json
    from datetime import datetime, timedelta
    import random
    
    # Configuration
    ROWS = $ROWS
    FILENAME = "$DATASET_TYPE.csv"
    
    # Column definitions with realistic value generators
    columns = {
        "id": "auto-increment",
        "name": "first_last_name",
        "email": "email",
        "created_at": "timestamp",
        # Add more columns...
    }
    
    def generate_dataset():
        """Generate realistic dummy dataset"""
        data = []
        for i in range(1, ROWS + 1):
            record = {
                "id": f"U{i:06d}",
                # Generate values based on column definitions
            }
            data.append(record)
        return data
    
    def save_as_csv(data, filename):
        """Save dataset as CSV"""
        with open(filename, 'w', newline='') as f:
            writer = csv.DictWriter(f, fieldnames=data[0].keys())
            writer.writeheader()
            writer.writerows(data)
    
    if __name__ == "__main__":
        dataset = generate_dataset()
        save_as_csv(dataset, FILENAME)
        print(f"Generated {len(dataset)} records in {FILENAME}")

    Example Dataset Specification

    Dataset Type: Customer Feedback

    Columns:

  • feedback_id (auto-increment, U001, U002...)

  • customer_name (realistic names)

  • email (valid email format)

  • feedback_date (dates last 90 days)

  • rating (1-5 stars)

  • category (Bug, Feature Request, Complaint, Praise)

  • text (realistic feedback)

  • product (electronics, clothing, home)
  • Constraints:

  • Ratings skewed: 40% 5-star, 30% 4-star, 20% 3-star, 10% 1-2 star

  • Bug category only with ratings 1-3

  • Feature requests only with ratings 3-5

  • Email domains realistic (gmail, yahoo, company.com)
  • Output Deliverables

  • Ready-to-execute Python script OR direct data file

  • CSV file with proper headers and formatting

  • JSON file with valid structure and types

  • SQL INSERT statements for database population

  • Data validation and constraint compliance

  • Realistic, business-appropriate values

  • Documentation of data generation logic

  • Quick-start instructions for using the dataset
  • Output Formats

    CSV: Flat tabular format, easy to import into spreadsheets and databases

    JSON: Nested structure, ideal for APIs and NoSQL databases

    SQL: INSERT statements, directly executable on relational databases

    Python Script: Executable generator for custom or large datasets

      dummy-dataset - Open Skills