Sag — OpenClaw Skill
ElevenLabs text-to-speech with premium voices, expressive audio tags, and mac-style say UX.
What This Skill Does
The Sag skill gives your OpenClaw agent high-quality text-to-speech powered by ElevenLabs. It provides a simple, mac-style say UX for generating spoken audio with premium voices, expressive audio tags for emotional control, pronunciation tuning, and multilingual support. Audio can be played locally or saved to file for sending via chat providers.
The v3 model (default) is the most expressive, supporting audio tags like [whispers], [shouts], [sings], [laughs], [sarcastic], [curious], and [excited] for nuanced delivery. Pause control uses [pause], [short pause], and [long pause]. The v2 multilingual model adds SSML <break> support, and flash v2.5 trades some quality for speed.
This is ideal for creating voice responses, generating audio content for sharing on WhatsApp or other messaging platforms, producing narration for video content, or adding a voice interface to your agent. Pronunciation can be fine-tuned with respelling, hyphens, casing, and the --normalize flag for numbers and URLs.
Example Prompts
Say "Hello there, welcome to the presentation" using the Roger voice
Generate a voice response explaining the quarterly results in an excited tone
Create an audio message whispering "This is a secret" and save it as an MP3
List all available ElevenLabs voices so I can pick one
Record a dramatic reading of this poem with pauses between stanzas
Say this meeting summary aloud using the multilingual model in German
Generate an audio reply as a sarcastic scientist character
Create a voice memo of these action items and send it via WhatsApp
Requirements
Binary dependency: sag must be installed. API key required.
- Install via Homebrew:
brew install steipete/tap/sag - API key: Set
ELEVENLABS_API_KEY(orSAG_API_KEY) from your ElevenLabs account - Optional: Set
ELEVENLABS_VOICE_IDorSAG_VOICE_IDfor a default voice
Setup on KiwiClaw
Add your ElevenLabs API key in the KiwiClaw dashboard settings. Sag is pre-installed and your agent can generate speech immediately. On Standard plans, a pooled ElevenLabs key may be available for included usage.
Setup Self-Hosted
- Install sag:
brew install steipete/tap/sag - Set
ELEVENLABS_API_KEYin your environment - List available voices:
sag voices - Test:
sag "Hello from your AI agent"
Related Skills
- Sherpa-ONNX TTS -- offline TTS alternative with no cloud dependency
- Songsee -- visualize the audio files Sag generates
- WaCLI -- send voice messages via WhatsApp
- SonosCLI -- play generated audio on Sonos speakers
FAQ
What TTS models does Sag support?
Sag supports three ElevenLabs models: eleven_v3 (default, most expressive), eleven_multilingual_v2 (stable, multilingual), and eleven_flash_v2_5 (fastest). Choose based on your needs for expressiveness, language support, or speed.
Can Sag add emotions and expressions to speech?
Yes. With the v3 model, you can use audio tags at the start of lines: [whispers], [shouts], [sings], [laughs], [sarcastic], [curious], [excited], and more. Use [pause], [short pause], or [long pause] for timing control.
What API key does Sag need?
Sag requires an ELEVENLABS_API_KEY environment variable. Alternatively, SAG_API_KEY is also supported. Get your key from your ElevenLabs account dashboard at elevenlabs.io.
How is Sag different from Sherpa-ONNX TTS?
Sag uses ElevenLabs cloud API for premium, highly expressive voices with emotional control. Sherpa-ONNX TTS runs entirely offline with no cloud dependency. Choose Sag for quality and expressiveness, Sherpa-ONNX for privacy and offline use.