OpenAI Whisper API — OpenClaw Skill
Cloud-based audio transcription via OpenAI's Whisper API endpoint.
What This Skill Does
The OpenAI Whisper API skill lets your OpenClaw agent transcribe audio files by sending them to OpenAI's cloud API. Unlike the local Whisper skill, this sends audio to OpenAI's servers for processing, which is significantly faster -- especially on machines without a GPU. It supports language specification, speaker name prompts for better accuracy, and outputs text or JSON transcripts.
The skill uses a bundled shell script that handles the curl call to OpenAI's /v1/audio/transcriptions endpoint. You can specify the model (defaults to whisper-1), output format, language hint, and speaker name prompts. Transcripts are saved alongside the original audio file by default, or to a custom output path.
Choose the API skill when you need fast transcription and are comfortable sending audio to OpenAI's servers. For sensitive audio that shouldn't leave your machine, use the local Whisper skill instead. Both skills pair well with the Himalaya skill for emailing transcripts.
Example Prompts
Transcribe this voice memo quickly using the Whisper API and save the text
Transcribe the meeting recording at /tmp/meeting.ogg with speaker names "Peter, Daniel"
Transcribe this audio file in English and give me the result as JSON
Use the Whisper API to transcribe ~/Downloads/interview.m4a and save to /tmp/transcript.txt
Quickly transcribe all the voice memos in my Downloads folder using the cloud API
Requirements
Dependencies: curl (pre-installed on most systems)
- API key:
OPENAI_API_KEYenvironment variable (required) - Internet: Requires network access to OpenAI's API
- Cost: Charged per minute of audio transcribed by OpenAI
Setup on KiwiClaw
The Whisper API skill is available on all KiwiClaw plans. For BYOK plans, set your OPENAI_API_KEY in the KiwiClaw dashboard under environment variables. For Standard plans, transcription is included in your usage allocation. No additional setup needed.
Setup Self-Hosted
- Get an OpenAI API key from platform.openai.com
- Set the key:
export OPENAI_API_KEY="sk-your-key-here" - Or configure in
~/.openclaw/openclaw.jsonunderskills.openai-whisper-api.apiKey - Test with any audio file -- the skill uses
curlwhich is pre-installed
Related Skills
- OpenAI Whisper (Local) -- offline transcription with no API key needed
- Himalaya -- email transcripts after generating them
- CamSnap -- capture camera clips for transcription
- Model Usage -- track API spending on transcription
FAQ
What can the Whisper API skill do in OpenClaw?
The Whisper API skill transcribes audio files by sending them to OpenAI's cloud API endpoint. It's faster than local Whisper, supports language hints, speaker name prompts, and outputs text or JSON transcripts.
How is the Whisper API different from local Whisper?
The Whisper API sends audio to OpenAI's servers for processing, which is significantly faster especially on machines without GPUs. Local Whisper processes everything on your machine with no data leaving it. Choose API for speed, local for privacy. See our pricing comparison for cost details.
Does the Whisper API skill require an OpenAI API key?
Yes. Set the OPENAI_API_KEY environment variable or configure it in your OpenClaw skill settings. The API charges per minute of audio transcribed.
Is the Whisper API skill safe to use?
The Whisper API skill has been security-vetted by KiwiClaw. Audio is sent to OpenAI's servers for processing. If your audio contains sensitive information, consider using the local Whisper skill instead.