Timeline Architecture — “The Ferret Incident”
Author: rho-editor | Date: 2026-05-20 | Step: 6 prep (during Step 5)
Format
Using genmedia-assemble timeline (timeline.json). This gives us precise start/end times, multi-clip audio tracks, gaps, crossfades, and source trimming.
Track Structure
| Track ID | Type | Role | Volume | Ducking | Notes |
|---|---|---|---|---|---|
| V1 | video | — | mute | — | All 44 shots. Mute Veo-generated audio. |
| VO | audio | voice | +3 dB | — | 16 narrator VO stems (@atempo 1.1x) |
| DLG | audio | voice | 0 dB | — | 5 dialogue stems (raw, no atempo) |
| SCORE | audio | music | -2 dB | duck under “voice”, -12 dB | Per-movement score segments |
| SFX | audio | sfx | -3 dB | — | Bell dings, clock ticks, crystal shatter, pen scratch |
| AMB | audio | sfx | -18 dB | — | Lobby ambience (low continuous bed) |
Ducking Strategy (from Musical Arc)
- Score ducks -12 dB under voice (both VO and DLG tracks share “voice” role)
- VO boosted +3 dB per USAGE.md recommendation (TTS output tends quiet)
- AMB at -18 dB — barely perceptible texture, never competes
Video Track (V1) — Shot Timing Skeleton
Scene Transitions
- Hard cuts within scenes (fast pacing, especially Scene 2)
- 1.0s crossfades between scenes (5 transitions total)
- Master fade_in: 1.0s (from black)
- Master fade_out: 2.0s (to black — mandatory per role guide)
Title Card & Credits
- Title card: 8s black + text overlay (pre-scene 0, or separate shot)
- Credits: 12s (post-scene 5, or separate shot)
Shot Duration Reference
Scene 0 (8s): 0.1=8s Scene 1 (40s): 1.1=4, 1.2=9, 1.3=9, 1.4=8, 1.5=10 Scene 2 (97s): 2.1=5, 2.2=4, 2.3=4, 2.4=4, 2.5=4, 2.6=5, 2.7=5, 2.8=4, 2.9=5, 2.10=4, 2.11=4, 2.12=6, 2.13=4, 2.14=5, 2.15=4, 2.16=4, 2.17=5, 2.18=4, 2.19=5, 2.20=5, 2.21=5 Scene 3 (39s): 3.1=6, 3.2=10, 3.3=4, 3.4=4, 3.5=6, 3.6=5, 3.7=4 Scene 4 (56s): 4.1=10, 4.2=7, 4.3=10, 4.4=4, 4.5=6, 4.6=8, 4.7=7, 4.8=4 Scene 5 (16s): 5.1=6, 5.2=10
Raw total: 256s (crossfade overlaps: -5s → 251s → with branding: ~271s = 4:31)
Voice Track (VO) — Stem Placement
All VO stems processed with ffmpeg -af atempo=1.1 before placement.
| Stem | Shot | Shot Start* | Placement Offset | VO Start* | VO End* | Notes |
|---|---|---|---|---|---|---|
| vo_0_1 | 0.1 | 0s | 0.5s | 0.5s | 6.83s | |
| vo_1_2 | 1.2 | 16s | 0s | 16s | 27.4s | Bleeds 2.4s into 1.3 |
| vo_1_3 | 1.3 | 25s | 3.0s | 28s | 33.85s | After vo_1_2 bleed ends |
| vo_1_4 | 1.4 | 34s | 0.5s | 34.5s | 41.74s | |
| vo_2_2 | 2.2 | 53s | 0s | 53s | 56.16s | |
| vo_2_4 | 2.4 | 61s | 0s | 61s | 65s | Bleeds 1.7s into 2.5 |
| vo_2_7 | 2.7 | 74s | 0s | 74s | 78.62s | |
| vo_2_9 | 2.9 | 83s | 0s | 83s | 87.69s | |
| vo_2_12 | 2.12 | 95s | 0s | 95s | 99.55s | |
| vo_2_15 | 2.15 | 109s | 2s | 111s | 113.29s | After bell slap |
| vo_2_19 | 2.19 | 126s | 0s | 126s | 131s | Bleeds 0.13s into 2.20 |
| vo_3_1 | 3.1 | 144s | 0s | 144s | 150s | Bleeds 0.51s into 3.2 |
| vo_3_5 | 3.5 | 168s | 2s | 170s | 174.95s | After 2s silence; bleeds into 3.6 |
| vo_4_1 | 4.1 | 183s | 1s | 184s | 190.8s | |
| vo_4_3 | 4.3 | 200s | 1s | 201s | 208.78s | |
| vo_5_2 | 5.2 | 245s | 0s | 245s | 252.96s | Final VO |
*All times approximate — exact values calculated by timeline-helper from cumulative shot durations including crossfade overlaps.
Dialogue Track (DLG) — Stem Placement
| Stem | Shot | Offset in Shot | Notes |
|---|---|---|---|
| dlg_2_1 | 2.1 | 2s | Arthur: “Shoo.” |
| dlg_4_2 | 4.2 | 2s | Arthur: “Welcome to the Grand Lavender, Mr. Vance.” |
| dlg_4_5 | 4.5 | 0.5s | Vance: “Pendelton… ferret… chandelier.” |
| dlg_4_6 | 4.6 | 2s | Arthur: “Yes, sir… eleven.” |
| dlg_4_7 | 4.7 | 2s | Vance: “Noted.” |
SFX Track — Key Sound Events
| Sound | Shot | Timing | Notes |
|---|---|---|---|
| Bell ding ×3 | 1.3 | Evenly spaced across shot | Opening ritual |
| Bell slap ×1 | 2.15 | ~1.5s | Ferret hits bell |
| Clock tick (ambient) | 1.1, 2.5, 2.18, 3.7 | Throughout | Ticking under clock ECU shots |
| Fern crash | 2.3→2.4 | End of 2.3 / start of 2.4 | Off-screen impact |
| Suitcase slam | 2.12 | ~2s into shot | Arthur slams |
| Crystal drop + shatter | 3.3→3.5 | End of 3.4, start of 3.5 | Implied fall |
| Pen scratch | 4.4 | Throughout | Vance writing |
| Door open/close | 4.1, 4.8 | Start/end of shots | Heavy brass doors |
| Final bell ding ×1 | 5.2 | ~6s into shot | Bookend payoff |
Score Track — Per-Movement Segments
| Movement | Shots | Duration | Tempo | Instruments | Notes |
|---|---|---|---|---|---|
| I. Clockwork | 0.1-1.5 | ~48s | 80 BPM | Glockenspiel, celesta, pizzicato | Precise, metronomic |
| II. Discovery | 2.1-2.5 | ~21s | 80 BPM | Score strips down | Near-silence, tension |
| III. Escalation | 2.6-2.21 | ~76s | 100 BPM | Plucked cello, muted trumpet | Quick staccato phrases |
| IV. Chandelier | 3.1-3.7 | ~39s | 70→STOP | Build then 2s dead silence | Score STOPS at crystal drop |
| V. Inspector | 4.1-4.8 | ~56s | 60 BPM | Sparse icy strings | Pen scratch replaces score |
| VI. Return | 5.1-5.2 | ~16s | 80 BPM | Opening motif returns | Final ding, fade to black |
Assembly Notes
- Working directory mandate: Must
cd /workspace/shared-dirs/rho-team/before running genmedia-assemble. - V1 audio muted: All Veo clips generate audio by default. Must mute on V1 track.
- source_in/source_out: Use to trim Veo overhang (clips are generated ~4s longer than needed per Overhang Principle).
- Timeline-helper delegation: Spin up timeline-helper agent with this doc + final dailies filenames to calculate exact timestamps and produce the JSON.