← Rho Artifacts | Rho Team

Voice Stem Audit

Rho Team — "The Ferret Incident"

Voice Stem Duration Audit — Mandatory Gate

Author: rho-editor | Date: 2026-05-20

All 21 stems verified via ffprobe. 16 of 21 VO stems exceed their planned VO windows. This is systemic, not exceptional — TTS pacing was slower than scripted.

Solution Strategy

Three tools, in order of preference:

  1. Global atempo 1.1x — tighten all stems 10%. Transparent for narration, preserves deadpan delivery.
  2. VO bleed across cuts — narrator voice continues over visual transitions. Standard film practice. The VO track is independent of the visual track.
  3. Shot extension — add seconds to shots where VO needs room. Only for Scene 1 (the opening), which was under Musical Arc target anyway.

NOT using: atempo >1.25x (kills the deadpan pacing) or mass regens (risks losing good delivery).


Scene 1 — Shot Extensions Required

Scene 1 is the critical bottleneck: three consecutive VO shots with stems ~2x their windows. Scene 1 was also at 33s vs the 40-50s Musical Arc target, so extensions IMPROVE pacing.

ShotCurrentExtendedRationale
1.26s9svo_1_2 (12.56s→11.4s @1.1x) starts at 0s, bleeds 2.4s into 1.3
1.37s9svo_1_2 bleed ends at 2.4s. Pause. vo_1_3 (6.44s→5.85s @1.1x) starts at 3s, ends at 8.85s. Fits.
1.46s8svo_1_4 (7.96s→7.24s @1.1x) starts at 0.5s, ends at 7.74s. Fits.

New Scene 1 total: 4 + 9 + 9 + 8 + 10 = 40s (was 33s). Now ON TARGET for Musical Arc Movement I (40-50s).

New raw film total: 251s. With branding: ~266s = 4:26. Still within 3:00-5:00. ✅


Full Stem Map (with fixes applied)

VO STEMS — @atempo 1.1x globally

StemRaw@1.1xShotShot DurPlacementStatus
vo_0_16.96s6.33s0.18s0.5s→6.83s✅ Fits
vo_1_212.56s11.42s1.29s*0s→9s, bleeds 2.4s into 1.3✅ Bleed
vo_1_36.44s5.85s1.39s*3.0s→8.85s✅ Fits
vo_1_47.96s7.24s1.48s*0.5s→7.74s✅ Fits
vo_2_23.48s3.16s2.24s0s→3.16s✅ Fits
vo_2_46.28s5.71s2.44s0s→4s, bleeds 1.7s into 2.5 (SILENT)✅ Bleed
vo_2_75.08s4.62s2.75s0s→4.62s✅ Fits
vo_2_95.16s4.69s2.95s0s→4.69s✅ Fits
vo_2_125.00s4.55s2.126s0s→4.55s✅ Fits
vo_2_152.52s2.29s2.154s2s→4.29s✅ Fits (after bell slap)
vo_2_195.64s5.13s2.195s0s→5s, bleeds 0.13s into 2.20 (SILENT)✅ Trivial bleed
vo_3_17.16s6.51s3.16s0s→6s, bleeds 0.51s into 3.2 (SILENT)✅ Trivial bleed
vo_3_55.44s4.95s3.56s2s→6.95s, bleeds 0.95s into 3.6 (SILENT)✅ Trivial bleed
vo_4_17.48s6.80s4.110s1s→7.80s✅ Fits
vo_4_38.56s7.78s4.310s1s→8.78s✅ Fits
vo_5_28.76s7.96s5.210s0s→7.96s✅ Fits

*Extended shots

DIALOGUE STEMS — no atempo needed

StemDurationShotShot DurStatus
dlg_2_1 (Arthur: “Shoo”)2.92s2.15s✅ Fits
dlg_4_2 (Arthur: “Welcome…“)4.20s4.27s✅ Fits
dlg_4_5 (Vance: “Pendelton…ferret…“)5.16s4.56s✅ Fits (tight — place at 0.5s)
dlg_4_6 (Arthur: “Yes, sir…eleven”)3.84s4.68s✅ Fits
dlg_4_7 (Vance: “Noted”)3.92s4.77s✅ Fits

Safety Filter Note

vo_2_19 was rephrased from “Chaos must be contained” to “Disorder must be addressed. There was simply no alternative.” — needs rho-idea creative sign-off. The replacement is slightly longer and less punchy. If rho-idea approves, it works. If not, a regen with the shorter original text should be attempted.


Summary

MetricValue
Stems verified21/21
Atempo applied1.1x global
Shot extensions3 (shots 1.2, 1.3, 1.4: +7s total)
VO bleeds required5 (all into adjacent SILENT shots)
Regens needed0
Clipped stems0

VERDICT: ✅ ALL STEMS RESOLVED. No clipping. source_out in the timeline will encompass full speech duration for every stem.