Sag — OpenClaw Skill

ElevenLabs text-to-speech with premium voices, expressive audio tags, and mac-style say UX.

Audio & Speech Vetted

What This Skill Does

The Sag skill gives your OpenClaw agent high-quality text-to-speech powered by ElevenLabs. It provides a simple, mac-style say UX for generating spoken audio with premium voices, expressive audio tags for emotional control, pronunciation tuning, and multilingual support. Audio can be played locally or saved to file for sending via chat providers.

The v3 model (default) is the most expressive, supporting audio tags like [whispers], [shouts], [sings], [laughs], [sarcastic], [curious], and [excited] for nuanced delivery. Pause control uses [pause], [short pause], and [long pause]. The v2 multilingual model adds SSML <break> support, and flash v2.5 trades some quality for speed.

This is ideal for creating voice responses, generating audio content for sharing on WhatsApp or other messaging platforms, producing narration for video content, or adding a voice interface to your agent. Pronunciation can be fine-tuned with respelling, hyphens, casing, and the --normalize flag for numbers and URLs.

Example Prompts

Say "Hello there, welcome to the presentation" using the Roger voice

Generate a voice response explaining the quarterly results in an excited tone

Create an audio message whispering "This is a secret" and save it as an MP3

List all available ElevenLabs voices so I can pick one

Record a dramatic reading of this poem with pauses between stanzas

Say this meeting summary aloud using the multilingual model in German

Generate an audio reply as a sarcastic scientist character

Create a voice memo of these action items and send it via WhatsApp

Requirements

Binary dependency: sag must be installed. API key required.

  • Install via Homebrew: brew install steipete/tap/sag
  • API key: Set ELEVENLABS_API_KEY (or SAG_API_KEY) from your ElevenLabs account
  • Optional: Set ELEVENLABS_VOICE_ID or SAG_VOICE_ID for a default voice

Setup on KiwiClaw

Add your ElevenLabs API key in the KiwiClaw dashboard settings. Sag is pre-installed and your agent can generate speech immediately. On Standard plans, a pooled ElevenLabs key may be available for included usage.

Setup Self-Hosted

  1. Install sag: brew install steipete/tap/sag
  2. Set ELEVENLABS_API_KEY in your environment
  3. List available voices: sag voices
  4. Test: sag "Hello from your AI agent"

Related Skills

  • Sherpa-ONNX TTS -- offline TTS alternative with no cloud dependency
  • Songsee -- visualize the audio files Sag generates
  • WaCLI -- send voice messages via WhatsApp
  • SonosCLI -- play generated audio on Sonos speakers

FAQ

What TTS models does Sag support?

Sag supports three ElevenLabs models: eleven_v3 (default, most expressive), eleven_multilingual_v2 (stable, multilingual), and eleven_flash_v2_5 (fastest). Choose based on your needs for expressiveness, language support, or speed.

Can Sag add emotions and expressions to speech?

Yes. With the v3 model, you can use audio tags at the start of lines: [whispers], [shouts], [sings], [laughs], [sarcastic], [curious], [excited], and more. Use [pause], [short pause], or [long pause] for timing control.

What API key does Sag need?

Sag requires an ELEVENLABS_API_KEY environment variable. Alternatively, SAG_API_KEY is also supported. Get your key from your ElevenLabs account dashboard at elevenlabs.io.

How is Sag different from Sherpa-ONNX TTS?

Sag uses ElevenLabs cloud API for premium, highly expressive voices with emotional control. Sherpa-ONNX TTS runs entirely offline with no cloud dependency. Choose Sag for quality and expressiveness, Sherpa-ONNX for privacy and offline use.

Give your AI agent a voice

Premium ElevenLabs voices with emotional expression. Your agent speaks naturally.