baoyu-url-to-markdown

Fetch any URL and convert to markdown using Chrome CDP. Supports two modes - auto-capture on page load, or wait for user signal (for pages requiring login). Use when user wants to save a webpage as markdown.

View Source
name:baoyu-url-to-markdowndescription:Fetch any URL and convert to markdown using Chrome CDP. Supports two modes - auto-capture on page load, or wait for user signal (for pages requiring login). Use when user wants to save a webpage as markdown.

URL to Markdown

Fetches any URL via Chrome CDP and converts HTML to clean markdown.

Script Directory

Important: All scripts are located in the scripts/ subdirectory of this skill.

Agent Execution Instructions:

  • Determine this SKILL.md file's directory path as SKILL_DIR

  • Script path = ${SKILL_DIR}/scripts/.ts

  • Replace all ${SKILL_DIR} in this document with the actual path
  • Script Reference:

    ScriptPurpose
    scripts/main.tsCLI entry point for URL fetching

    Preferences (EXTEND.md)

    Use Bash to check EXTEND.md existence (priority order):

    # Check project-level first
    test -f .baoyu-skills/baoyu-url-to-markdown/EXTEND.md && echo "project"

    Then user-level (cross-platform: $HOME works on macOS/Linux/WSL)


    test -f "$HOME/.baoyu-skills/baoyu-url-to-markdown/EXTEND.md" && echo "user"

    ┌────────────────────────────────────────────────────────┬───────────────────┐
    │ Path │ Location │
    ├────────────────────────────────────────────────────────┼───────────────────┤
    │ .baoyu-skills/baoyu-url-to-markdown/EXTEND.md │ Project directory │
    ├────────────────────────────────────────────────────────┼───────────────────┤
    │ $HOME/.baoyu-skills/baoyu-url-to-markdown/EXTEND.md │ User home │
    └────────────────────────────────────────────────────────┴───────────────────┘

    ┌───────────┬───────────────────────────────────────────────────────────────────────────┐
    │ Result │ Action │
    ├───────────┼───────────────────────────────────────────────────────────────────────────┤
    │ Found │ Read, parse, apply settings │
    ├───────────┼───────────────────────────────────────────────────────────────────────────┤
    │ Not found │ Use defaults │
    └───────────┴───────────────────────────────────────────────────────────────────────────┘

    EXTEND.md Supports: Default output directory | Default capture mode | Timeout settings

    Features

  • Chrome CDP for full JavaScript rendering

  • Two capture modes: auto or wait-for-user

  • Clean markdown output with metadata

  • Handles login-required pages via wait mode
  • Usage

    # Auto mode (default) - capture when page loads
    npx -y bun ${SKILL_DIR}/scripts/main.ts <url>

    Wait mode - wait for user signal before capture


    npx -y bun ${SKILL_DIR}/scripts/main.ts <url> --wait

    Save to specific file


    npx -y bun ${SKILL_DIR}/scripts/main.ts <url> -o output.md

    Options

    OptionDescription
    <url>URL to fetch
    -o <path>Output file path (default: auto-generated)
    --waitWait for user signal before capturing
    --timeout <ms>Page load timeout (default: 30000)

    Capture Modes

    ModeBehaviorUse When
    Auto (default)Capture on network idlePublic pages, static content
    Wait (--wait)User signals when readyLogin-required, lazy loading, paywalls

    Wait mode workflow:

  • Run with --wait → script outputs "Press Enter when ready"

  • Ask user to confirm page is ready

  • Send newline to stdin to trigger capture
  • Output Format

    YAML front matter with url, title, description, author, published, captured_at fields, followed by converted markdown content.

    Output Directory

    url-to-markdown/<domain>/<slug>.md

  • : From page title or URL path (kebab-case, 2-6 words)

  • Conflict resolution: Append timestamp -YYYYMMDD-HHMMSS.md
  • Environment Variables

    VariableDescription
    URL_CHROME_PATHCustom Chrome executable path
    URL_DATA_DIRCustom data directory
    URL_CHROME_PROFILE_DIRCustom Chrome profile directory

    Troubleshooting: Chrome not found → set URL_CHROME_PATH. Timeout → increase --timeout. Complex pages → try --wait mode.

    Extension Support

    Custom configurations via EXTEND.md. See Preferences section for paths and supported options.