Voice Stem Duration Audit — Time Theft (Pi Team)

Author: pi-editor Date: 2026-05-19 Status: PASS (with timeline adjustments)

Methodology

All 18 stems independently verified with ffprobe. Durations match tech lead’s report.

Editorial Decisions

The timing “hints” in the scene list were aspirational guides, not hard constraints. TTS output is inherently unpredictable. My editorial decision: let VO fill more of the shot duration rather than cramming into narrow windows. The narration IS the show — the audience needs to hear every word.

Severe Overruns — Resolved

Shot	Stem	Actual	Shot Dur	Decision
19	shot19_vo.wav	5.88s	4s	Apply atempo 1.30 -> ~4.5s. Place at 0s, clip at shot boundary. The stuttering “Company time!” SHOULD sound rushed in Act IV.
22	shot22_vo.wav	5.40s	4s	Apply atempo 1.30 -> ~4.2s. Same treatment — glitchy repetition benefits from speed.
26	shot26_vo.wav	9.04s	8s	Apply atempo 1.13 -> ~8.0s. Very natural speed-up for the cold, authoritative conclusion.

Moderate Overruns — Resolved via Window Expansion

Most “overruns” are only overruns against the narrow hint window, NOT against the total shot duration. Simply expanding the VO placement window resolves them:

Shot	Stem	Actual	Shot Dur	Adjusted Window	atempo
2	shot02_vo.wav	10.04s	8s	0s-8s	1.26 (bring to 8s)
3	shot03_vo.wav	7.80s	8s	0s-8s	None needed
4	shot04_vo.wav	9.56s	10s	0s-8s	1.20 (bring to ~8s, leaves 2s gap before dialogue)
5	shot05_vo.wav	6.00s	8s	1s-7s	None needed
8	shot08_vo.wav	6.36s	8s	1s-8s	None needed
9	shot09_vo.wav	4.88s	7s	1s-6s	None needed
11	shot11_vo.wav	4.12s	6s	1s-5s	None needed
24 VO	shot24_vo.wav	2.28s	5s	0s-2.3s	None needed
29	shot29_vo.wav	6.24s	8s	1s-7s	None needed

Dialogue Stems — All OK

Shot	Stem	Actual	Window	Decision
4	shot04_dialogue.wav	2.36s	8s-9s	Place at 8s, fits within shot
14	shot14_dialogue.wav	2.68s	2s-4s	Place at 2s, extends to 4.7s — OK
24	shot24_dialogue.wav	2.04s	3s-4s	Place at 3s — OK
28	shot28_dialogue.wav	2.60s	5s-7s	Place at 5s — OK

Atempo Processing Needed

Before timeline assembly, these stems need speed processing:

# Act IV frantic VO — intentionally rushed
ffmpeg -y -i voice/shot19_vo.wav -af "atempo=1.30" voice/shot19_vo_fitted.wav
ffmpeg -y -i voice/shot22_vo.wav -af "atempo=1.30" voice/shot22_vo_fitted.wav

# Act I opening narration — natural speed-up
ffmpeg -y -i voice/shot02_vo.wav -af "atempo=1.26" voice/shot02_vo_fitted.wav

# Act I approach narration
ffmpeg -y -i voice/shot04_vo.wav -af "atempo=1.20" voice/shot04_vo_fitted.wav

# Act V conclusion — very gentle speed-up
ffmpeg -y -i voice/shot26_vo.wav -af "atempo=1.13" voice/shot26_vo_fitted.wav

Music Stems

Stem	Duration	Coverage	Status
act1_muzak.wav	2:25	Act I (45s) + overflow	✅ Long enough
act2_3_muzak_fast.wav	2:37	Acts II+III (75s) + overflow	✅ Long enough
act4_muzak_frantic.wav	31s	Act IV (30s)	✅ Just covers
act5_muzak_mournful.wav	29s	Act V opening	✅ OK
finale_muzak_brassy.wav	31s	Act V closing	✅ OK
clock_ticking.wav	31s	Entire film (needs looping)	⚠️ Will need segments + atempo for acceleration

Clock Tick Strategy

The 31s ticking stem will be used as source material:

Act I: Use as-is (base tempo)
Act II: atempo 1.3 (faster ticking)
Act III: atempo 1.7
Act IV: atempo 2.0 (frantic)
Will create tempo variants before timeline assembly

GATE DECISION: VOICE AUDIT — CLEARED

All stems are usable with the documented adjustments.

Voice Stem Audit