← Rho Artifacts | Rho Team

Audio Ducking Plan

Rho Team — "The Ferret Incident"

Ducking Plan — “The Ferret Incident”

Author: rho-editor | Date: 2026-05-20

Track-Level Ducking Configuration

The genmedia-assemble timeline format supports per-track ducking via sidechaincompress. Ducking operates at the track level, not per-clip.

TrackRoleBase VolDuck UnderDuck dBRationale
VOvoice+3 dBMaster reference. TTS output boosted per USAGE.md recommendation.
DLGvoice0 dBDialogue at natural level. Shares “voice” role with VO for ducking triggers.
SCOREmusic-2 dBvoice-12 dBDocumentary standard. Score drops ~4-5dB when any voice (VO or DLG) is active.
SFXsfx-3 dBNo ducking — SFX events are discrete and intentional.
AMBsfx-18 dBContinuous bed, too quiet to need ducking.
V1video-100 dBMuted (silences Veo-generated audio).

Per-Movement Ducking Behavior

Since ducking is track-level, we achieve the musical arc’s per-movement variation through composition, not mix automation:

MovementTimelineVO Active?DLG Active?Score StrategyEffective Result
I. Clockwork0-47sYes (vo_0_1)NoGlockenspiel at moderate level. Ducks under VO. No VO for first 0.5s → music establishes.Music audible in gaps, drops ~5dB under VO.
II. Discovery47-67sYes (vo_2_2, vo_2_4)Yes (dlg_2_1)Near-silence by design. Sustained single note. Duck unnecessary — score barely present.Functionally muted already.
III. Escalation67-140sYes (vo_2_7 through vo_2_19)NoPlucked staccato phrases. Drops under frequent VO.Score heard in brief gaps between VO segments.
IV. Chandelier140-178sYes (vo_3_1, vo_3_5)NoBuild → STOP at crystal drop. 2s dead silence.Score disappears mid-movement. Silence is the loudest moment.
V. Inspector178-233sYes (vo_4_1, vo_4_3)Yes (dlg_4_2 thru dlg_4_7)Sparse icy strings at VERY low compositional level. Dense dialogue.Score essentially inaudible under dialogue. -12dB duck + sparse writing = near-mute.
VI. Return233-249sYes (vo_5_2)NoOpening motif returns at full composition. No VO for first ~0s.Score at its most prominent. Ducks only under final VO.

Voice Activity Map (for ducking trigger analysis)

Shows where voice tracks are active, triggering score ducking:

Timeline (seconds):
0    10   20   30   40   50   60   70   80   90   100  110  120  130  140  150  160  170  180  190  200  210  220  230  240  249
|    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |

VO:  [0.5----6.8]     [11----22.4]  [23-28.9] [29.5-36.7]           [53-56.2] [61-66.7]      [74-78.6] [83-87.7]    [95-99.5]    [111-113.3]           [126-131.1]                [144-150.5]       [170-175]              [184-190.8]     [201-208.8]                              [239-247]
DLG:                                                       [48-50.9]                                                                                                                              [190-194.2]       [209.5-214.7] [215-218.8] [225-228.9]

SCORE DUCKING (where voice active → score ducks -12dB):
     [===]             [========]   [=====]   [======]     [======]  [====]   [===]     [====][====]     [====]       [==]               [====]              [======]          [====]                [==========]     [========]    [=====]    [=====]    [========]

SCORE UNDUCKED (where no voice → score at full -2dB):
[.5s]      [4.2s]  [.6s]                  [.5s]    [9.3s]       [.8s]  [4.8s]   [5s]       [5.3s]  [5.5s]   [11.5s]   [12.7s]  [5s]   [8.9s]     [5s]       [8s]      [11.2s]  [.2s]    [.8s]  [.3s]   [5.2s]       [10.1s]

Key Ducking Moments

  1. Opening (0-0.5s): Score plays unducked for 0.5s before VO enters → listener hears music “step back” for narrator.
  2. Scene 2 start (47-53s): 6s unducked gap between DLG and VO → score can breathe.
  3. Escalation VO density (67-140s): Frequent VO → score ducks heavily. The staccato composition helps — notes land in gaps.
  4. Chandelier silence (162-164s): Both score AND VO drop out → 2s of ONLY clock ticks + AMB. The deepest silence in the film.
  5. Inspector dialogue cluster (188-229s): DLG + VO overlap zone → maximum ducking. Score barely audible by design.
  6. Return unducked opening (233-239s): 6s of unducked score → the only time music plays at full level. Then VO enters for the final narration.

Recommendation