markitdown
Convert files and office documents to Markdown. Supports PDF, DOCX, PPTX, XLSX, images (with OCR), audio (with transcription), HTML, CSV, JSON, XML, ZIP, YouTube URLs, EPubs and more.
Author
Category
Document ProcessingInstall
Hot:332
Download and extract to your skills directory
Copy command and send to OpenClaw for auto-install:
Download and install this skill https://openskills.cc/api/download?slug=k-dense-ai-scientific-skills-markitdown&locale=en&source=copy
MarkItDown - Microsoft Document to Markdown Tool
Overview
MarkItDown is a Python tool developed by Microsoft that can convert 15+ file formats into LLM-friendly Markdown text, supporting PDFs, Office documents, image OCR, audio transcription, YouTube subtitle extraction, and more.
Use Cases
1. Academic Literature Organization
Convert academic paper PDFs to Markdown format to make them easier for AI models to understand and process. It supports image OCR to recognize chart text in scanned documents, and with AI-enhanced features can automatically generate image descriptions, making it an ideal tool for literature reviews and building knowledge bases.
2. Bulk Office Document Processing
Quickly convert Word, PowerPoint, Excel and other office documents into structured Markdown. Supports batch processing of multiple files while preserving tables, formatting, and content structure—suitable for document migration, content management, and automation workflow integration.
3. Multimedia Content Extraction
Extract text from images (OCR), generate transcripts from audio files, and retrieve subtitles from YouTube videos. Supports AI-enhanced image description generation, suitable for accessibility, content archiving, and multimedia analysis scenarios.
Core Features
1. Support for 15+ File Formats
Covers PDF, DOCX, PPTX, XLSX, images (JPEG/PNG/GIF/WebP), audio (WAV/MP3), HTML, CSV, JSON, XML, ZIP, EPUB, and YouTube video links. Offers both command-line and Python API modes for flexible integration into various workflows.
2. AI-enhanced Image Descriptions
Integrates with OpenRouter/OpenAI APIs to automatically generate detailed descriptions for PPTX and image files. Supports multiple AI models (such as Claude Opus), making it especially suitable for processing scientific presentations and technical documents containing charts and visual content.
3. OCR and Transcription Capabilities
Built-in image OCR can recognize text in scanned documents and images (requires installing tesseract). Supports speech transcription for WAV and MP3 audio files, as well as automatic extraction of YouTube video subtitles, converting multimedia content into searchable text.
Frequently Asked Questions
Which file formats does MarkItDown support?
MarkItDown supports 15+ formats, including: PDF, DOCX (Word), PPTX (PowerPoint), XLSX (Excel), images (JPEG/PNG/GIF/WebP, including OCR), audio (WAV/MP3, including transcription), HTML, CSV, JSON, XML, ZIP, EPUB, and YouTube video URLs. You can selectively install the dependencies for the formats you need.
How do I use MarkItDown to convert a PDF to Markdown?
Basic usage:
markitdown document.pdf -o output.md. Python API usage:from markitdown import MarkItDown
md = MarkItDown()
result = md.convert("document.pdf")
print(result.text_content)For complex PDFs, it is recommended to enable Azure Document Intelligence to improve conversion quality.
What is the difference between MarkItDown and Pandoc?
MarkItDown focuses on converting various file formats into LLM-friendly Markdown and is particularly optimized for token efficiency with AI models. It includes built-in OCR, audio transcription, and AI image description features out of the box. Pandoc is a more general document conversion tool that supports a wider range of format-to-format conversions but does not include OCR or AI-enhanced features. The two can be used together: MarkItDown handles file-to-Markdown conversion, and Pandoc handles Markdown-to-other-format output.