audio-transcriber
Transform audio recordings into professional Markdown documentation with intelligent summaries using LLM integration
Author
Eric Andrade
Category
Document ProcessingInstall
Download and extract to your skills directory
Copy command and send to OpenClaw for auto-install:
Audio Transcriber - Intelligent Audio-to-Document Tool
Overview
Audio Transcriber is a zero-configuration audio-to-text tool that can automatically transcribe audio files such as meeting recordings and interview recordings into professional Markdown documents, and intelligently generate meeting minutes, action items, and executive summaries.
Use Cases
Automatically transcribe recordings of team meetings and client calls into structured documents, identify different speakers, extract discussion points and resolutions, and save time on manual organization.
Quickly convert long-form audio from journalist interviews, academic lectures, podcast recordings, and similar scenarios into searchable text documents, with support for SRT/VTT subtitle export.
Structure audio content into transcriptions with timestamps to facilitate later retrieval, citation, and knowledge preservation.
Core Features
Automatically detects the Faster-Whisper or Whisper engine on the system, supports common formats like MP3, WAV, M4A, OGG, FLAC, WEBM, and more, and requires no API key or manual configuration to start transcribing.
Automatically extracts attendees, discussion topics, resolutions, and action items from transcriptions, and can integrate with LLMs to further generate executive summaries, making meeting records more valuable.
Supports speaker diarization, automatically identifies the number of participants and speech segments, and extracts metadata such as audio duration, file size, and language, outputting a complete transcription report.
Frequently Asked Questions
What audio formats does Audio Transcriber support?
It supports mainstream audio formats including MP3, WAV, M4A, OGG, FLAC, WEBM, MP4, etc. If ffmpeg is installed, it can also automatically convert incompatible formats.
How long does it take to transcribe a one-hour meeting?
When using Faster-Whisper, processing time is about 10–20% of the audio duration (i.e., a 1-hour audio takes about 6–12 minutes). Actual time depends on hardware performance and model choice.
Does this tool require an internet connection?
No. Whisper and Faster-Whisper are models that run locally, and the transcription process is completely offline. Only generating summaries with an LLM requires an internet connection.