Audio CLI Editing Reference
Summary
This report outlines practical techniques for CLI-based audio post-production, addressing multitrack timeline editing, visual waveform/spectrogram inspection, and automatic audio ducking for voice-overs (VO). FFmpeg serves as the robust foundational tool for these tasks, with its filter_complex capabilities being central to CLI timeline editing and ducking. The audiowaveform tool provides highly optimized waveform data and image generation for visual inspection in headless environments. These patterns are well-suited for integration with our Go-based mcp-avtool-go wrapper in Docker.
Detailed Findings
1. CLI-Based Multitrack Audio Timeline Editing
FFmpeg’s filter_complex acts as a headless non-linear editor. Building an audio timeline primarily relies on the adelay filter (for clip positioning), atrim (for cutting), and amix (for compositing tracks).
Recipe: 3-Track Mix with Delays and Fades To lay out three tracks where Track 1 starts immediately, Track 2 fades in at 5 seconds, and Track 3 starts at 10 seconds:
ffmpeg -i track1.mp3 -i track2.mp3 -i track3.mp3 -filter_complex \
"[0:a]volume=0.8[t1]; \
[1:a]adelay=5000|5000,afade=t=in:st=5:d=2[t2]; \
[2:a]adelay=10000|10000,volume=0.5[t3]; \
[t1][t2][t3]amix=inputs=3:normalize=0[out]" \
-map "[out]" output.mp3
- Crucial flags:
adelay=5000|5000requires per-channel delay in older versions, orall=1in modern FFmpeg.amix=normalize=0prevents volume from suddenly jumping (“pumping”) when one of the input streams ends. By default,amixscales volume by1/nof active inputs.
Go Integration Note: For complex timelines with dozens of clips, stringing together a single CLI command will hit length limits. The Go tool should write the filtergraph string to a temporary .txt file and pass it via -filter_complex_script.
2. Visual Inspection of Multitrack Audio
For headless environments and agents, generating image representations of audio waveforms and spectrograms is essential for debugging timing and frequency overlaps.
A. audiowaveform (BBC) Highly recommended for waveforms due to speed and precision.
- Generate Waveform Image:
audiowaveform -i input.wav -o waveform.png --width 1000 --height 200 --waveform-color '#0000ff' --background-color '#ffffff' - Note: It only accepts WAV, MP3, FLAC, and OGG, but is significantly faster than FFmpeg.
B. FFmpeg (showwavespic and showspectrumpic)
If audiowaveform is not installed in the Docker container, FFmpeg natively supports visual generation.
- Waveform:
ffmpeg -i input.mp3 -filter_complex "showwavespic=s=1200x400:colors=red|blue" -frames:v 1 waveform.png - Spectrogram (frequency analysis for overlapping EQ issues):
ffmpeg -i input.mp3 -lavfi showspectrumpic=s=1200x600:color=magma:legend=1 spectrogram.png
3. Music Ducking During Voice-Over
Automatic ducking reduces background music volume when a voice track is active. The sidechaincompress filter is the standard for this in FFmpeg.
Recipe: Standard Ducking
- Split the voice track (
asplit). - Use one split to trigger the compressor on the background music.
- Mix the compressed music with the other voice split.
ffmpeg -i music.mp3 -i voice.mp3 -filter_complex \
"[1:a]asplit=2[sc][voice_mix]; \
[0:a][sc]sidechaincompress=threshold=0.03:ratio=5:attack=20:release=300[bg_ducked]; \
[bg_ducked][voice_mix]amix=inputs=2:duration=first:normalize=0[out]" \
-map "[out]" output.mp3
- Parameters:
threshold=0.03: Triggers when VO hits this volume. Adjust if ducking is not triggering.ratio=5: Depth of the volume drop.attack=20: Quick 20ms fade down.release=300: Smooth 300ms fade back up.duration=first: Ensures the output file ends when the music ends.
Recipe: Advanced “Lookahead” Ducking To prevent a slight volume blip at the very beginning of words, trim the start of the sidechain trigger:
"[1:a]asplit=2[sc_raw][voice_mix]; \
[sc_raw]atrim=start=0.05[sc_shifted]; \
[0:a][sc_shifted]sidechaincompress=threshold=0.03:ratio=5:attack=10:release=300[bg_ducked]; \
[bg_ducked][voice_mix]amix=inputs=2:duration=first:normalize=0[out]"
Sources
- FFmpeg Filters - Timeline Editing - Official docs on
adelay,atrim, andamix. - FFmpeg Filters - sidechaincompress - Ducking parameters.
- BBC audiowaveform GitHub - High-speed waveform generation.
Open Questions
- Docker Container Tooling: Is
audiowaveforminstalled in our base container? If not, we will need to update the Dockerfile or rely solely on FFmpeg’sshowwavespic. amixVolume Scaling: Sincenormalize=0might still require manually tweaking the base volume levels of inputs to avoid clipping, we need a reliable programmatic way in our Go tool to normalize voice and music stems before they hit the mix phase.