OpenAI Whisper — OpenClaw Skill
Local speech-to-text transcription with no API key required.
What This Skill Does
The OpenAI Whisper skill gives your OpenClaw agent local speech-to-text transcription powered by OpenAI's Whisper model. Your agent can transcribe audio files in any format (MP3, M4A, WAV, OGG, and more), translate non-English audio to English, and output results as plain text, SRT subtitles, or VTT captions. Everything runs locally on your machine -- no API key or internet connection is needed after the initial model download.
Whisper supports multiple model sizes from tiny (fastest, least accurate) to large (slowest, most accurate), with turbo as the default on most installations. Models are downloaded once to ~/.cache/whisper and cached for future use. This makes it ideal for transcribing meeting recordings, voice memos, podcast episodes, or any audio content without sending data to external servers.
For faster cloud-based transcription when privacy is less of a concern, see the Whisper API skill. For capturing audio from cameras, pair this with the CamSnap skill.
Example Prompts
Transcribe the meeting recording at ~/Downloads/meeting.mp3 and save it as a text file
Translate this Spanish audio file to English text
Generate SRT subtitles for the podcast episode at /tmp/episode.m4a
Transcribe this voice memo using the medium model for better accuracy
Convert the audio from my camera clip to text and summarize what was said
Transcribe all the .m4a files in ~/Voice Memos/ and put the transcripts in ~/Transcripts/
Requirements
Binary dependency: whisper
- macOS:
brew install openai-whisper - Storage: Models download to
~/.cache/whisper(74MB to 2.9GB depending on model size) - No API key required: Runs entirely locally
Setup on KiwiClaw
OpenAI Whisper is pre-installed on all KiwiClaw tenant machines with the turbo model ready to use. Upload audio files or point your agent to a file path and transcription starts immediately. No API key or configuration needed. Manage files from the KiwiClaw dashboard.
Setup Self-Hosted
- Install Whisper:
brew install openai-whisper - Verify:
whisper --help - Test:
whisper /path/to/audio.mp3 --model turbo --output_format txt - The model downloads automatically on first run (~1.5GB for turbo)
- Use
--model tinyfor faster but less accurate transcription
Related Skills
- OpenAI Whisper API -- cloud-based transcription for faster results
- CamSnap -- capture camera clips for audio transcription
- Bear Notes -- save transcripts to Bear for reference
- Himalaya -- email transcripts to colleagues
FAQ
What can the OpenAI Whisper skill do in OpenClaw?
The Whisper skill lets your OpenClaw agent transcribe audio and video files locally using OpenAI's Whisper model. It supports multiple languages, translation to English, and output formats including plain text, SRT subtitles, and VTT captions.
Does the Whisper skill require an API key?
No. The local Whisper CLI runs entirely on your machine with no API key needed. Models are downloaded once to ~/.cache/whisper and run offline after that. For cloud-based transcription, see the Whisper API skill. Compare costs in our self-hosting cost guide.
How is this different from the Whisper API skill?
The Whisper (local) skill runs the model directly on your machine -- no internet connection or API key needed. The Whisper API skill sends audio to OpenAI's servers for transcription, which is faster but requires an API key and internet access.
Is the OpenAI Whisper skill safe to use?
The Whisper skill has been security-vetted by KiwiClaw. It processes audio entirely locally -- no data leaves your machine. The model weights are downloaded from OpenAI's official source on first use.