Audio CLI Editing Reference

Summary

This report outlines practical techniques for CLI-based audio post-production, addressing multitrack timeline editing, visual waveform/spectrogram inspection, and automatic audio ducking for voice-overs (VO). FFmpeg serves as the robust foundational tool for these tasks, with its filter_complex capabilities being central to CLI timeline editing and ducking. The audiowaveform tool provides highly optimized waveform data and image generation for visual inspection in headless environments. These patterns are well-suited for integration with our Go-based mcp-avtool-go wrapper in Docker.

Detailed Findings

1. CLI-Based Multitrack Audio Timeline Editing

FFmpeg’s filter_complex acts as a headless non-linear editor. Building an audio timeline primarily relies on the adelay filter (for clip positioning), atrim (for cutting), and amix (for compositing tracks).

Recipe: 3-Track Mix with Delays and Fades To lay out three tracks where Track 1 starts immediately, Track 2 fades in at 5 seconds, and Track 3 starts at 10 seconds:

ffmpeg -i track1.mp3 -i track2.mp3 -i track3.mp3 -filter_complex \
"[0:a]volume=0.8[t1]; \
 [1:a]adelay=5000|5000,afade=t=in:st=5:d=2[t2]; \
 [2:a]adelay=10000|10000,volume=0.5[t3]; \
 [t1][t2][t3]amix=inputs=3:normalize=0[out]" \
 -map "[out]" output.mp3

Crucial flags:
- adelay=5000|5000 requires per-channel delay in older versions, or all=1 in modern FFmpeg.
- amix=normalize=0 prevents volume from suddenly jumping (“pumping”) when one of the input streams ends. By default, amix scales volume by 1/n of active inputs.

Go Integration Note: For complex timelines with dozens of clips, stringing together a single CLI command will hit length limits. The Go tool should write the filtergraph string to a temporary .txt file and pass it via -filter_complex_script.

2. Visual Inspection of Multitrack Audio

For headless environments and agents, generating image representations of audio waveforms and spectrograms is essential for debugging timing and frequency overlaps.

A. audiowaveform (BBC) Highly recommended for waveforms due to speed and precision.

Generate Waveform Image:

audiowaveform -i input.wav -o waveform.png --width 1000 --height 200 --waveform-color '#0000ff' --background-color '#ffffff'

Note: It only accepts WAV, MP3, FLAC, and OGG, but is significantly faster than FFmpeg.

B. FFmpeg (showwavespic and showspectrumpic) If audiowaveform is not installed in the Docker container, FFmpeg natively supports visual generation.

Waveform:

ffmpeg -i input.mp3 -filter_complex "showwavespic=s=1200x400:colors=red|blue" -frames:v 1 waveform.png

Spectrogram (frequency analysis for overlapping EQ issues):

ffmpeg -i input.mp3 -lavfi showspectrumpic=s=1200x600:color=magma:legend=1 spectrogram.png

3. Music Ducking During Voice-Over

Automatic ducking reduces background music volume when a voice track is active. The sidechaincompress filter is the standard for this in FFmpeg.

Recipe: Standard Ducking

Split the voice track (asplit).
Use one split to trigger the compressor on the background music.
Mix the compressed music with the other voice split.

ffmpeg -i music.mp3 -i voice.mp3 -filter_complex \
"[1:a]asplit=2[sc][voice_mix]; \
 [0:a][sc]sidechaincompress=threshold=0.03:ratio=5:attack=20:release=300[bg_ducked]; \
 [bg_ducked][voice_mix]amix=inputs=2:duration=first:normalize=0[out]" \
-map "[out]" output.mp3

Parameters:
- threshold=0.03: Triggers when VO hits this volume. Adjust if ducking is not triggering.
- ratio=5: Depth of the volume drop.
- attack=20: Quick 20ms fade down.
- release=300: Smooth 300ms fade back up.
- duration=first: Ensures the output file ends when the music ends.

Recipe: Advanced “Lookahead” Ducking To prevent a slight volume blip at the very beginning of words, trim the start of the sidechain trigger:

"[1:a]asplit=2[sc_raw][voice_mix]; \
 [sc_raw]atrim=start=0.05[sc_shifted]; \
 [0:a][sc_shifted]sidechaincompress=threshold=0.03:ratio=5:attack=10:release=300[bg_ducked]; \
 [bg_ducked][voice_mix]amix=inputs=2:duration=first:normalize=0[out]"

Sources

FFmpeg Filters - Timeline Editing - Official docs on adelay, atrim, and amix.
FFmpeg Filters - sidechaincompress - Ducking parameters.
BBC audiowaveform GitHub - High-speed waveform generation.

Open Questions

Docker Container Tooling: Is audiowaveform installed in our base container? If not, we will need to update the Dockerfile or rely solely on FFmpeg’s showwavespic.
amix Volume Scaling: Since normalize=0 might still require manually tweaking the base volume levels of inputs to avoid clipping, we need a reliable programmatic way in our Go tool to normalize voice and music stems before they hit the mix phase.