Voice Stem Duration Audit — Mandatory Gate
Author: rho-editor | Date: 2026-05-20
All 21 stems verified via ffprobe. 16 of 21 VO stems exceed their planned VO windows. This is systemic, not exceptional — TTS pacing was slower than scripted.
Solution Strategy
Three tools, in order of preference:
- Global atempo 1.1x — tighten all stems 10%. Transparent for narration, preserves deadpan delivery.
- VO bleed across cuts — narrator voice continues over visual transitions. Standard film practice. The VO track is independent of the visual track.
- Shot extension — add seconds to shots where VO needs room. Only for Scene 1 (the opening), which was under Musical Arc target anyway.
NOT using: atempo >1.25x (kills the deadpan pacing) or mass regens (risks losing good delivery).
Scene 1 — Shot Extensions Required
Scene 1 is the critical bottleneck: three consecutive VO shots with stems ~2x their windows. Scene 1 was also at 33s vs the 40-50s Musical Arc target, so extensions IMPROVE pacing.
| Shot | Current | Extended | Rationale |
|---|---|---|---|
| 1.2 | 6s | 9s | vo_1_2 (12.56s→11.4s @1.1x) starts at 0s, bleeds 2.4s into 1.3 |
| 1.3 | 7s | 9s | vo_1_2 bleed ends at 2.4s. Pause. vo_1_3 (6.44s→5.85s @1.1x) starts at 3s, ends at 8.85s. Fits. |
| 1.4 | 6s | 8s | vo_1_4 (7.96s→7.24s @1.1x) starts at 0.5s, ends at 7.74s. Fits. |
New Scene 1 total: 4 + 9 + 9 + 8 + 10 = 40s (was 33s). Now ON TARGET for Musical Arc Movement I (40-50s).
New raw film total: 251s. With branding: ~266s = 4:26. Still within 3:00-5:00. ✅
Full Stem Map (with fixes applied)
VO STEMS — @atempo 1.1x globally
| Stem | Raw | @1.1x | Shot | Shot Dur | Placement | Status |
|---|---|---|---|---|---|---|
| vo_0_1 | 6.96s | 6.33s | 0.1 | 8s | 0.5s→6.83s | ✅ Fits |
| vo_1_2 | 12.56s | 11.42s | 1.2 | 9s* | 0s→9s, bleeds 2.4s into 1.3 | ✅ Bleed |
| vo_1_3 | 6.44s | 5.85s | 1.3 | 9s* | 3.0s→8.85s | ✅ Fits |
| vo_1_4 | 7.96s | 7.24s | 1.4 | 8s* | 0.5s→7.74s | ✅ Fits |
| vo_2_2 | 3.48s | 3.16s | 2.2 | 4s | 0s→3.16s | ✅ Fits |
| vo_2_4 | 6.28s | 5.71s | 2.4 | 4s | 0s→4s, bleeds 1.7s into 2.5 (SILENT) | ✅ Bleed |
| vo_2_7 | 5.08s | 4.62s | 2.7 | 5s | 0s→4.62s | ✅ Fits |
| vo_2_9 | 5.16s | 4.69s | 2.9 | 5s | 0s→4.69s | ✅ Fits |
| vo_2_12 | 5.00s | 4.55s | 2.12 | 6s | 0s→4.55s | ✅ Fits |
| vo_2_15 | 2.52s | 2.29s | 2.15 | 4s | 2s→4.29s | ✅ Fits (after bell slap) |
| vo_2_19 | 5.64s | 5.13s | 2.19 | 5s | 0s→5s, bleeds 0.13s into 2.20 (SILENT) | ✅ Trivial bleed |
| vo_3_1 | 7.16s | 6.51s | 3.1 | 6s | 0s→6s, bleeds 0.51s into 3.2 (SILENT) | ✅ Trivial bleed |
| vo_3_5 | 5.44s | 4.95s | 3.5 | 6s | 2s→6.95s, bleeds 0.95s into 3.6 (SILENT) | ✅ Trivial bleed |
| vo_4_1 | 7.48s | 6.80s | 4.1 | 10s | 1s→7.80s | ✅ Fits |
| vo_4_3 | 8.56s | 7.78s | 4.3 | 10s | 1s→8.78s | ✅ Fits |
| vo_5_2 | 8.76s | 7.96s | 5.2 | 10s | 0s→7.96s | ✅ Fits |
*Extended shots
DIALOGUE STEMS — no atempo needed
| Stem | Duration | Shot | Shot Dur | Status |
|---|---|---|---|---|
| dlg_2_1 (Arthur: “Shoo”) | 2.92s | 2.1 | 5s | ✅ Fits |
| dlg_4_2 (Arthur: “Welcome…“) | 4.20s | 4.2 | 7s | ✅ Fits |
| dlg_4_5 (Vance: “Pendelton…ferret…“) | 5.16s | 4.5 | 6s | ✅ Fits (tight — place at 0.5s) |
| dlg_4_6 (Arthur: “Yes, sir…eleven”) | 3.84s | 4.6 | 8s | ✅ Fits |
| dlg_4_7 (Vance: “Noted”) | 3.92s | 4.7 | 7s | ✅ Fits |
Safety Filter Note
vo_2_19 was rephrased from “Chaos must be contained” to “Disorder must be addressed. There was simply no alternative.” — needs rho-idea creative sign-off. The replacement is slightly longer and less punchy. If rho-idea approves, it works. If not, a regen with the shorter original text should be attempted.
Summary
| Metric | Value |
|---|---|
| Stems verified | 21/21 |
| Atempo applied | 1.1x global |
| Shot extensions | 3 (shots 1.2, 1.3, 1.4: +7s total) |
| VO bleeds required | 5 (all into adjacent SILENT shots) |
| Regens needed | 0 |
| Clipped stems | 0 |
VERDICT: ✅ ALL STEMS RESOLVED. No clipping. source_out in the timeline will encompass full speech duration for every stem.