Sherpa-ONNX TTS — OpenClaw Skill

Local offline text-to-speech via sherpa-onnx. Generate speech from text without any cloud API dependency.

Audio & Speech Vetted

What This Skill Does

The Sherpa-ONNX TTS skill gives your OpenClaw agent the ability to convert text into natural-sounding speech entirely offline. It uses the sherpa-onnx runtime with ONNX-based voice models, meaning all audio synthesis happens locally on your machine with zero network calls. This is ideal for privacy-sensitive environments, air-gapped systems, or anywhere you need reliable TTS without depending on a cloud provider.

The skill ships with the Piper en_US lessac high-quality voice model by default, but you can swap in any compatible model from the sherpa-onnx tts-models release page. This includes multilingual voices for dozens of languages. The runtime supports macOS (universal binary), Linux x64, and Windows x64, and the skill handles downloading the correct platform binary automatically during installation.

Output is generated as WAV files that your agent can save to disk, attach to messages, or pipe into downstream audio workflows. You can control the voice model, output path, and various synthesis parameters through environment variables and CLI flags.

Example Prompts

Convert this email draft to an audio file so I can review it while walking

Read the summary of today's meeting notes aloud and save it as meeting-recap.wav

Generate a spoken version of my blog post intro paragraph

Create an audio greeting that says "Welcome to the team, Sarah" and save it to /tmp/welcome.wav

Turn this bullet-point list into a spoken audio file I can share with the team

Synthesize speech for the notification message "Your deployment is complete" using the local TTS engine

Requirements

Environment variables: SHERPA_ONNX_RUNTIME_DIR and SHERPA_ONNX_MODEL_DIR must be configured.

  • Runtime: The sherpa-onnx shared library for your platform (macOS, Linux, or Windows) -- downloaded automatically during skill installation
  • Voice model: At least one ONNX voice model (Piper en_US lessac high is included by default)
  • Supported platforms: macOS (universal), Linux x64, Windows x64

Setup on KiwiClaw

This skill is pre-installed on all KiwiClaw plans with the sherpa-onnx runtime and default English voice model ready to go. Your agent can generate speech audio immediately -- no environment variables or model downloads to configure. Upgrade to additional voices from the KiwiClaw dashboard.

Setup Self-Hosted

  1. Download the sherpa-onnx runtime for your OS (extracts into ~/.openclaw/tools/sherpa-onnx-tts/runtime)
  2. Download a voice model (extracts into ~/.openclaw/tools/sherpa-onnx-tts/models)
  3. Set SHERPA_ONNX_RUNTIME_DIR to the runtime directory path
  4. Set SHERPA_ONNX_MODEL_DIR to the model directory path (e.g., models/vits-piper-en_US-lessac-high)
  5. Configure the skill in your openclaw.json under skills.entries["sherpa-onnx-tts"]

Related Skills

  • Sag -- cloud-based ElevenLabs TTS with premium voices and expressive audio tags
  • Songsee -- generate spectrograms and visualizations from the audio files you create
  • Video Frames -- extract frames from video to pair with generated audio narration
  • SonosCLI -- play your generated audio on Sonos speakers

FAQ

Does Sherpa-ONNX TTS require an internet connection?

No. Sherpa-ONNX TTS runs entirely offline using local ONNX models. The runtime and voice models are downloaded once during setup and all speech synthesis happens on-device with zero cloud calls.

What voice models are supported?

The skill ships with the Piper en_US lessac (high quality) model by default. You can swap in any compatible model from the sherpa-onnx tts-models release page, including multilingual voices for German, French, Spanish, and many other languages.

Which operating systems does Sherpa-ONNX TTS support?

Sherpa-ONNX TTS supports macOS (universal binary), Linux x64, and Windows x64. The skill automatically downloads the correct runtime for your platform.

Can I use multiple voices with this skill?

Yes. Download additional voice models from the sherpa-onnx tts-models releases and point the SHERPA_ONNX_MODEL_DIR environment variable at the desired model directory, or use the --model-file flag per invocation.

Add offline text-to-speech to your AI agent

Local TTS with no cloud dependency. Your agent speaks without sending data anywhere.