Songsee — OpenClaw Skill | KiwiClaw Skills Hub

What This Skill Does

The Songsee skill gives your OpenClaw agent the ability to generate spectrograms and multi-panel audio visualizations from any audio file. It wraps the songsee CLI, which supports nine visualization types: standard spectrogram, mel spectrogram, chroma, HPSS (harmonic-percussive source separation), self-similarity matrix, loudness curve, tempogram, MFCC, and spectral flux.

You can render a single visualization panel or combine multiple into a grid layout for comprehensive audio analysis. The tool supports time-slicing to focus on specific segments, customizable color palettes (classic, magma, inferno, viridis, gray), adjustable FFT parameters, and configurable output dimensions. WAV and MP3 files are decoded natively, while other formats use ffmpeg if available.

This is particularly useful for music production, audio engineering, podcast analysis, and any workflow where you need a visual representation of audio content. Output is saved as JPG or PNG and can be piped from stdin for integration into larger audio processing pipelines.

Example Prompts

Generate a spectrogram of this podcast episode so I can see where the conversation gets loud

Create a multi-panel visualization of track.mp3 showing spectrogram, mel, chroma, and loudness

Show me the frequency content of the first 30 seconds of this audio file

Visualize the tempo changes in this song using a tempogram

Generate a self-similarity matrix for this audio to help me find repeated sections

Create a spectrogram of the intro from 0 to 8 seconds using the magma color palette

Show me the MFCC features for this voice recording to check audio quality

Requirements

Binary dependency: songsee must be installed and available in PATH.

Install via Homebrew: brew install steipete/tap/songsee
Optional: ffmpeg for non-WAV/MP3 audio format support

Setup on KiwiClaw

Songsee is pre-installed on all KiwiClaw plans. Upload or reference an audio file and ask your agent to generate a visualization -- no installation or configuration required. Manage your agent's skills from the KiwiClaw dashboard.

Setup Self-Hosted

Install songsee via Homebrew: brew install steipete/tap/songsee
Optionally install ffmpeg for broader format support: brew install ffmpeg
Verify installation: songsee --version
The skill activates automatically when your agent needs to visualize audio

Related Skills

Video Frames -- extract visual frames from video files for analysis
Sherpa-ONNX TTS -- generate audio that you can then visualize with Songsee
Sag -- create TTS audio with ElevenLabs, then inspect with Songsee
SonosCLI -- play the audio you are analyzing on Sonos speakers

FAQ

What audio formats does Songsee support?

Songsee natively decodes WAV and MP3 files. For other audio formats like FLAC, OGG, or AAC, it uses ffmpeg if available on the system. Output visualizations can be saved as JPG or PNG.

What types of visualizations can Songsee generate?

Songsee supports nine visualization types: spectrogram, mel spectrogram, chroma, HPSS (harmonic-percussive separation), self-similarity, loudness, tempogram, MFCC, and spectral flux. You can render a single panel or combine multiple into a grid using the --viz flag.

Can I visualize just a portion of an audio file?

Yes. Use the --start and --duration flags to specify a time slice. For example, --start 12.5 --duration 8 will visualize only eight seconds starting at the 12.5 second mark.

Does Songsee work with audio from stdin?

Yes. You can pipe audio data through stdin using the dash syntax: cat track.mp3 | songsee - --format png -o out.png. This is useful for processing audio from other tools in a pipeline.