Songsee — OpenClaw Skill
Generate spectrograms and feature-panel visualizations from audio files.
What This Skill Does
The Songsee skill gives your OpenClaw agent the ability to generate spectrograms and multi-panel audio visualizations from any audio file. It wraps the songsee CLI, which supports nine visualization types: standard spectrogram, mel spectrogram, chroma, HPSS (harmonic-percussive source separation), self-similarity matrix, loudness curve, tempogram, MFCC, and spectral flux.
You can render a single visualization panel or combine multiple into a grid layout for comprehensive audio analysis. The tool supports time-slicing to focus on specific segments, customizable color palettes (classic, magma, inferno, viridis, gray), adjustable FFT parameters, and configurable output dimensions. WAV and MP3 files are decoded natively, while other formats use ffmpeg if available.
This is particularly useful for music production, audio engineering, podcast analysis, and any workflow where you need a visual representation of audio content. Output is saved as JPG or PNG and can be piped from stdin for integration into larger audio processing pipelines.
Example Prompts
Generate a spectrogram of this podcast episode so I can see where the conversation gets loud
Create a multi-panel visualization of track.mp3 showing spectrogram, mel, chroma, and loudness
Show me the frequency content of the first 30 seconds of this audio file
Visualize the tempo changes in this song using a tempogram
Generate a self-similarity matrix for this audio to help me find repeated sections
Create a spectrogram of the intro from 0 to 8 seconds using the magma color palette
Show me the MFCC features for this voice recording to check audio quality
Requirements
Binary dependency: songsee must be installed and available in PATH.
- Install via Homebrew:
brew install steipete/tap/songsee - Optional:
ffmpegfor non-WAV/MP3 audio format support
Setup on KiwiClaw
Songsee is pre-installed on all KiwiClaw plans. Upload or reference an audio file and ask your agent to generate a visualization -- no installation or configuration required. Manage your agent's skills from the KiwiClaw dashboard.
Setup Self-Hosted
- Install songsee via Homebrew:
brew install steipete/tap/songsee - Optionally install ffmpeg for broader format support:
brew install ffmpeg - Verify installation:
songsee --version - The skill activates automatically when your agent needs to visualize audio
Related Skills
- Video Frames -- extract visual frames from video files for analysis
- Sherpa-ONNX TTS -- generate audio that you can then visualize with Songsee
- Sag -- create TTS audio with ElevenLabs, then inspect with Songsee
- SonosCLI -- play the audio you are analyzing on Sonos speakers
FAQ
What audio formats does Songsee support?
Songsee natively decodes WAV and MP3 files. For other audio formats like FLAC, OGG, or AAC, it uses ffmpeg if available on the system. Output visualizations can be saved as JPG or PNG.
What types of visualizations can Songsee generate?
Songsee supports nine visualization types: spectrogram, mel spectrogram, chroma, HPSS (harmonic-percussive separation), self-similarity, loudness, tempogram, MFCC, and spectral flux. You can render a single panel or combine multiple into a grid using the --viz flag.
Can I visualize just a portion of an audio file?
Yes. Use the --start and --duration flags to specify a time slice. For example, --start 12.5 --duration 8 will visualize only eight seconds starting at the 12.5 second mark.
Does Songsee work with audio from stdin?
Yes. You can pipe audio data through stdin using the dash syntax: cat track.mp3 | songsee - --format png -o out.png. This is useful for processing audio from other tools in a pipeline.