baoyu-image-gen

利用OpenAI、Google和DashScope API进行AI图像生成。支持文本转图像、参考图像输入及多种宽高比设定。默认采用顺序生成模式,可根据需求启用并行生成功能。适用于用户提出生成、创建或绘制图像请求的场景。

查看详情
name:baoyu-image-gendescription:AI image generation with OpenAI, Google and DashScope APIs. Supports text-to-image, reference images, aspect ratios. Sequential by default; parallel generation available on request. Use when user asks to generate, create, or draw images.

Image Generation (AI SDK)

Official API-based image generation. Supports OpenAI, Google and DashScope (阿里通义万象) providers.

Script Directory

Agent Execution:

  • SKILL_DIR = this SKILL.md file's directory

  • Script path = ${SKILL_DIR}/scripts/main.ts
  • Preferences (EXTEND.md)

    Use Bash to check EXTEND.md existence (priority order):

    # Check project-level first
    test -f .baoyu-skills/baoyu-image-gen/EXTEND.md && echo "project"

    Then user-level (cross-platform: $HOME works on macOS/Linux/WSL)


    test -f "$HOME/.baoyu-skills/baoyu-image-gen/EXTEND.md" && echo "user"

    ┌──────────────────────────────────────────────────┬───────────────────┐
    │ Path │ Location │
    ├──────────────────────────────────────────────────┼───────────────────┤
    │ .baoyu-skills/baoyu-image-gen/EXTEND.md │ Project directory │
    ├──────────────────────────────────────────────────┼───────────────────┤
    │ $HOME/.baoyu-skills/baoyu-image-gen/EXTEND.md │ User home │
    └──────────────────────────────────────────────────┴───────────────────┘

    ┌───────────┬───────────────────────────────────────────────────────────────────────────┐
    │ Result │ Action │
    ├───────────┼───────────────────────────────────────────────────────────────────────────┤
    │ Found │ Read, parse, apply settings │
    ├───────────┼───────────────────────────────────────────────────────────────────────────┤
    │ Not found │ Use defaults │
    └───────────┴───────────────────────────────────────────────────────────────────────────┘

    EXTEND.md Supports: Default provider | Default quality | Default aspect ratio | Default image size | Default models

    Schema: references/config/preferences-schema.md

    Usage

    # Basic
    npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png

    With aspect ratio


    npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A landscape" --image out.png --ar 16:9

    High quality


    npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image out.png --quality 2k

    From prompt files


    npx -y bun ${SKILL_DIR}/scripts/main.ts --promptfiles system.md content.md --image out.png

    With reference images (Google multimodal or OpenAI edits)


    npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Make blue" --image out.png --ref source.png

    With reference images (explicit provider/model)


    npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Make blue" --image out.png --provider google --model gemini-3-pro-image-preview --ref source.png

    Specific provider


    npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image out.png --provider openai

    DashScope (阿里通义万象)


    npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "一只可爱的猫" --image out.png --provider dashscope

    Options

    OptionDescription
    --prompt <text>, -pPrompt text
    --promptfiles <files...>Read prompt from files (concatenated)
    --image <path>Output image path (required)
    --provider google\openai\
    --model <id>, -mModel ID (--ref with OpenAI requires GPT Image model, e.g. gpt-image-1.5)
    --ar <ratio>Aspect ratio (e.g., 16:9, 1:1, 4:3)
    --size <WxH>Size (e.g., 1024x1024)
    --quality normal\2k
    --imageSize 1K\2K\
    --ref <files...>Reference images. Supported by Google multimodal and OpenAI edits (GPT Image models). If provider omitted: Google first, then OpenAI
    --n <count>Number of images
    --jsonJSON output

    Environment Variables

    VariableDescription
    OPENAI_API_KEYOpenAI API key
    GOOGLE_API_KEYGoogle API key
    DASHSCOPE_API_KEYDashScope API key (阿里云)
    OPENAI_IMAGE_MODELOpenAI model override
    GOOGLE_IMAGE_MODELGoogle model override
    DASHSCOPE_IMAGE_MODELDashScope model override (default: z-image-turbo)
    OPENAI_BASE_URLCustom OpenAI endpoint
    GOOGLE_BASE_URLCustom Google endpoint
    DASHSCOPE_BASE_URLCustom DashScope endpoint

    Load Priority: CLI args > EXTEND.md > env vars > /.baoyu-skills/.env > ~/.baoyu-skills/.env

    Provider Selection

  • --ref provided + no --provider → auto-select Google first, then OpenAI

  • --provider specified → use it (if --ref, must be google or openai)

  • Only one API key available → use that provider

  • Multiple available → default to Google
  • Quality Presets

    PresetGoogle imageSizeOpenAI SizeUse Case
    normal1K1024pxQuick previews
    2k (default)2K2048pxCovers, illustrations, infographics

    Google imageSize: Can be overridden with --imageSize 1K|2K|4K

    Aspect Ratios

    Supported: 1:1, 16:9, 9:16, 4:3, 3:4, 2.35:1

  • Google multimodal: uses imageConfig.aspectRatio

  • Google Imagen: uses aspectRatio parameter

  • OpenAI: maps to closest supported size
  • Generation Mode

    Default: Sequential generation (one image at a time). This ensures stable output and easier debugging.

    Parallel Generation: Only use when user explicitly requests parallel/concurrent generation.

    ModeWhen to Use
    Sequential (default)Normal usage, single images, small batches
    ParallelUser explicitly requests, large batches (10+)

    Parallel Settings (when requested):

    SettingValue
    Recommended concurrency4 subagents
    Max concurrency8 subagents
    Use caseLarge batch generation when user requests parallel

    Agent Implementation (parallel mode only):

    # Launch multiple generations in parallel using Task tool

    Each Task runs as background subagent with run_in_background=true


    Collect results via TaskOutput when all complete

    Error Handling

  • Missing API key → error with setup instructions

  • Generation failure → auto-retry once

  • Invalid aspect ratio → warning, proceed with default

  • Reference images with unsupported provider/model → error with fix hint (switch to Google multimodal or OpenAI GPT Image edits)
  • Extension Support

    Custom configurations via EXTEND.md. See Preferences section for paths and supported options.

      baoyu-image-gen - Agent Skills