← Back to Documentation
Research

Audio CLI Editing Reference

Summary

This report outlines practical techniques for CLI-based audio post-production, addressing multitrack timeline editing, visual waveform/spectrogram inspection, and automatic audio ducking for voice-overs (VO). FFmpeg serves as the robust foundational tool for these tasks, with its filter_complex capabilities being central to CLI timeline editing and ducking. The audiowaveform tool provides highly optimized waveform data and image generation for visual inspection in headless environments. These patterns are well-suited for integration with our Go-based mcp-avtool-go wrapper in Docker.

Detailed Findings

1. CLI-Based Multitrack Audio Timeline Editing

FFmpeg’s filter_complex acts as a headless non-linear editor. Building an audio timeline primarily relies on the adelay filter (for clip positioning), atrim (for cutting), and amix (for compositing tracks).

Recipe: 3-Track Mix with Delays and Fades To lay out three tracks where Track 1 starts immediately, Track 2 fades in at 5 seconds, and Track 3 starts at 10 seconds:

ffmpeg -i track1.mp3 -i track2.mp3 -i track3.mp3 -filter_complex \
"[0:a]volume=0.8[t1]; \
 [1:a]adelay=5000|5000,afade=t=in:st=5:d=2[t2]; \
 [2:a]adelay=10000|10000,volume=0.5[t3]; \
 [t1][t2][t3]amix=inputs=3:normalize=0[out]" \
 -map "[out]" output.mp3

Go Integration Note: For complex timelines with dozens of clips, stringing together a single CLI command will hit length limits. The Go tool should write the filtergraph string to a temporary .txt file and pass it via -filter_complex_script.

2. Visual Inspection of Multitrack Audio

For headless environments and agents, generating image representations of audio waveforms and spectrograms is essential for debugging timing and frequency overlaps.

A. audiowaveform (BBC) Highly recommended for waveforms due to speed and precision.

B. FFmpeg (showwavespic and showspectrumpic) If audiowaveform is not installed in the Docker container, FFmpeg natively supports visual generation.

3. Music Ducking During Voice-Over

Automatic ducking reduces background music volume when a voice track is active. The sidechaincompress filter is the standard for this in FFmpeg.

Recipe: Standard Ducking

  1. Split the voice track (asplit).
  2. Use one split to trigger the compressor on the background music.
  3. Mix the compressed music with the other voice split.
ffmpeg -i music.mp3 -i voice.mp3 -filter_complex \
"[1:a]asplit=2[sc][voice_mix]; \
 [0:a][sc]sidechaincompress=threshold=0.03:ratio=5:attack=20:release=300[bg_ducked]; \
 [bg_ducked][voice_mix]amix=inputs=2:duration=first:normalize=0[out]" \
-map "[out]" output.mp3

Recipe: Advanced “Lookahead” Ducking To prevent a slight volume blip at the very beginning of words, trim the start of the sidechain trigger:

"[1:a]asplit=2[sc_raw][voice_mix]; \
 [sc_raw]atrim=start=0.05[sc_shifted]; \
 [0:a][sc_shifted]sidechaincompress=threshold=0.03:ratio=5:attack=10:release=300[bg_ducked]; \
 [bg_ducked][voice_mix]amix=inputs=2:duration=first:normalize=0[out]"

Sources

Open Questions