OpenAI Whisper API — OpenClaw Skill

What This Skill Does

The OpenAI Whisper API skill lets your OpenClaw agent transcribe audio files by sending them to OpenAI's cloud API. Unlike the local Whisper skill, this sends audio to OpenAI's servers for processing, which is significantly faster -- especially on machines without a GPU. It supports language specification, speaker name prompts for better accuracy, and outputs text or JSON transcripts.

The skill uses a bundled shell script that handles the curl call to OpenAI's /v1/audio/transcriptions endpoint. You can specify the model (defaults to whisper-1), output format, language hint, and speaker name prompts. Transcripts are saved alongside the original audio file by default, or to a custom output path.

Choose the API skill when you need fast transcription and are comfortable sending audio to OpenAI's servers. For sensitive audio that shouldn't leave your machine, use the local Whisper skill instead. Both skills pair well with the Himalaya skill for emailing transcripts.

Example Prompts

Transcribe this voice memo quickly using the Whisper API and save the text

Transcribe the meeting recording at /tmp/meeting.ogg with speaker names "Peter, Daniel"

Transcribe this audio file in English and give me the result as JSON

Use the Whisper API to transcribe ~/Downloads/interview.m4a and save to /tmp/transcript.txt

Quickly transcribe all the voice memos in my Downloads folder using the cloud API

Requirements

Dependencies: curl (pre-installed on most systems)

API key: OPENAI_API_KEY environment variable (required)
Internet: Requires network access to OpenAI's API
Cost: Charged per minute of audio transcribed by OpenAI

Setup on KiwiClaw

The Whisper API skill is available on all KiwiClaw plans. For BYOK plans, set your OPENAI_API_KEY in the KiwiClaw dashboard under environment variables. For Standard plans, transcription is included in your usage allocation. No additional setup needed.

Setup Self-Hosted

Get an OpenAI API key from platform.openai.com
Set the key: export OPENAI_API_KEY="sk-your-key-here"
Or configure in ~/.openclaw/openclaw.json under skills.openai-whisper-api.apiKey
Test with any audio file -- the skill uses curl which is pre-installed

Related Skills

OpenAI Whisper (Local) -- offline transcription with no API key needed
Himalaya -- email transcripts after generating them
CamSnap -- capture camera clips for transcription
Model Usage -- track API spending on transcription

FAQ

What can the Whisper API skill do in OpenClaw?

The Whisper API skill transcribes audio files by sending them to OpenAI's cloud API endpoint. It's faster than local Whisper, supports language hints, speaker name prompts, and outputs text or JSON transcripts.

How is the Whisper API different from local Whisper?

The Whisper API sends audio to OpenAI's servers for processing, which is significantly faster especially on machines without GPUs. Local Whisper processes everything on your machine with no data leaving it. Choose API for speed, local for privacy. See our pricing comparison for cost details.

Does the Whisper API skill require an OpenAI API key?

Yes. Set the OPENAI_API_KEY environment variable or configure it in your OpenClaw skill settings. The API charges per minute of audio transcribed.

Is the Whisper API skill safe to use?

The Whisper API skill has been security-vetted by KiwiClaw. Audio is sent to OpenAI's servers for processing. If your audio contains sensitive information, consider using the local Whisper skill instead.