Sag — OpenClaw Skill | KiwiClaw Skills Hub

What This Skill Does

The Sag skill gives your OpenClaw agent high-quality text-to-speech powered by ElevenLabs. It provides a simple, mac-style say UX for generating spoken audio with premium voices, expressive audio tags for emotional control, pronunciation tuning, and multilingual support. Audio can be played locally or saved to file for sending via chat providers.

The v3 model (default) is the most expressive, supporting audio tags like [whispers], [shouts], [sings], [laughs], [sarcastic], [curious], and [excited] for nuanced delivery. Pause control uses [pause], [short pause], and [long pause]. The v2 multilingual model adds SSML <break> support, and flash v2.5 trades some quality for speed.

This is ideal for creating voice responses, generating audio content for sharing on WhatsApp or other messaging platforms, producing narration for video content, or adding a voice interface to your agent. Pronunciation can be fine-tuned with respelling, hyphens, casing, and the --normalize flag for numbers and URLs.

Example Prompts

Say "Hello there, welcome to the presentation" using the Roger voice

Generate a voice response explaining the quarterly results in an excited tone

Create an audio message whispering "This is a secret" and save it as an MP3

List all available ElevenLabs voices so I can pick one

Record a dramatic reading of this poem with pauses between stanzas

Say this meeting summary aloud using the multilingual model in German

Generate an audio reply as a sarcastic scientist character

Create a voice memo of these action items and send it via WhatsApp

Requirements

Binary dependency: sag must be installed. API key required.

Install via Homebrew: brew install steipete/tap/sag
API key: Set ELEVENLABS_API_KEY (or SAG_API_KEY) from your ElevenLabs account
Optional: Set ELEVENLABS_VOICE_ID or SAG_VOICE_ID for a default voice

Setup on KiwiClaw

Add your ElevenLabs API key in the KiwiClaw dashboard settings. Sag is pre-installed and your agent can generate speech immediately. On Standard plans, a pooled ElevenLabs key may be available for included usage.

Setup Self-Hosted

Install sag: brew install steipete/tap/sag
Set ELEVENLABS_API_KEY in your environment
List available voices: sag voices
Test: sag "Hello from your AI agent"

Related Skills

Sherpa-ONNX TTS -- offline TTS alternative with no cloud dependency
Songsee -- visualize the audio files Sag generates
WaCLI -- send voice messages via WhatsApp
SonosCLI -- play generated audio on Sonos speakers

FAQ

What TTS models does Sag support?

Sag supports three ElevenLabs models: eleven_v3 (default, most expressive), eleven_multilingual_v2 (stable, multilingual), and eleven_flash_v2_5 (fastest). Choose based on your needs for expressiveness, language support, or speed.

Can Sag add emotions and expressions to speech?

Yes. With the v3 model, you can use audio tags at the start of lines: [whispers], [shouts], [sings], [laughs], [sarcastic], [curious], [excited], and more. Use [pause], [short pause], or [long pause] for timing control.

What API key does Sag need?

Sag requires an ELEVENLABS_API_KEY environment variable. Alternatively, SAG_API_KEY is also supported. Get your key from your ElevenLabs account dashboard at elevenlabs.io.

How is Sag different from Sherpa-ONNX TTS?

Sag uses ElevenLabs cloud API for premium, highly expressive voices with emotional control. Sherpa-ONNX TTS runs entirely offline with no cloud dependency. Choose Sag for quality and expressiveness, Sherpa-ONNX for privacy and offline use.