Voice Stem Duration Audit — Time Theft (Pi Team)
Author: pi-editor Date: 2026-05-19 Status: PASS (with timeline adjustments)
Methodology
All 18 stems independently verified with ffprobe. Durations match tech lead’s report.
Editorial Decisions
The timing “hints” in the scene list were aspirational guides, not hard constraints. TTS output is inherently unpredictable. My editorial decision: let VO fill more of the shot duration rather than cramming into narrow windows. The narration IS the show — the audience needs to hear every word.
Severe Overruns — Resolved
| Shot | Stem | Actual | Shot Dur | Decision |
|---|---|---|---|---|
| 19 | shot19_vo.wav | 5.88s | 4s | Apply atempo 1.30 -> ~4.5s. Place at 0s, clip at shot boundary. The stuttering “Company time!” SHOULD sound rushed in Act IV. |
| 22 | shot22_vo.wav | 5.40s | 4s | Apply atempo 1.30 -> ~4.2s. Same treatment — glitchy repetition benefits from speed. |
| 26 | shot26_vo.wav | 9.04s | 8s | Apply atempo 1.13 -> ~8.0s. Very natural speed-up for the cold, authoritative conclusion. |
Moderate Overruns — Resolved via Window Expansion
Most “overruns” are only overruns against the narrow hint window, NOT against the total shot duration. Simply expanding the VO placement window resolves them:
| Shot | Stem | Actual | Shot Dur | Adjusted Window | atempo |
|---|---|---|---|---|---|
| 2 | shot02_vo.wav | 10.04s | 8s | 0s-8s | 1.26 (bring to 8s) |
| 3 | shot03_vo.wav | 7.80s | 8s | 0s-8s | None needed |
| 4 | shot04_vo.wav | 9.56s | 10s | 0s-8s | 1.20 (bring to ~8s, leaves 2s gap before dialogue) |
| 5 | shot05_vo.wav | 6.00s | 8s | 1s-7s | None needed |
| 8 | shot08_vo.wav | 6.36s | 8s | 1s-8s | None needed |
| 9 | shot09_vo.wav | 4.88s | 7s | 1s-6s | None needed |
| 11 | shot11_vo.wav | 4.12s | 6s | 1s-5s | None needed |
| 24 VO | shot24_vo.wav | 2.28s | 5s | 0s-2.3s | None needed |
| 29 | shot29_vo.wav | 6.24s | 8s | 1s-7s | None needed |
Dialogue Stems — All OK
| Shot | Stem | Actual | Window | Decision |
|---|---|---|---|---|
| 4 | shot04_dialogue.wav | 2.36s | 8s-9s | Place at 8s, fits within shot |
| 14 | shot14_dialogue.wav | 2.68s | 2s-4s | Place at 2s, extends to 4.7s — OK |
| 24 | shot24_dialogue.wav | 2.04s | 3s-4s | Place at 3s — OK |
| 28 | shot28_dialogue.wav | 2.60s | 5s-7s | Place at 5s — OK |
Atempo Processing Needed
Before timeline assembly, these stems need speed processing:
# Act IV frantic VO — intentionally rushed
ffmpeg -y -i voice/shot19_vo.wav -af "atempo=1.30" voice/shot19_vo_fitted.wav
ffmpeg -y -i voice/shot22_vo.wav -af "atempo=1.30" voice/shot22_vo_fitted.wav
# Act I opening narration — natural speed-up
ffmpeg -y -i voice/shot02_vo.wav -af "atempo=1.26" voice/shot02_vo_fitted.wav
# Act I approach narration
ffmpeg -y -i voice/shot04_vo.wav -af "atempo=1.20" voice/shot04_vo_fitted.wav
# Act V conclusion — very gentle speed-up
ffmpeg -y -i voice/shot26_vo.wav -af "atempo=1.13" voice/shot26_vo_fitted.wav
Music Stems
| Stem | Duration | Coverage | Status |
|---|---|---|---|
| act1_muzak.wav | 2:25 | Act I (45s) + overflow | ✅ Long enough |
| act2_3_muzak_fast.wav | 2:37 | Acts II+III (75s) + overflow | ✅ Long enough |
| act4_muzak_frantic.wav | 31s | Act IV (30s) | ✅ Just covers |
| act5_muzak_mournful.wav | 29s | Act V opening | ✅ OK |
| finale_muzak_brassy.wav | 31s | Act V closing | ✅ OK |
| clock_ticking.wav | 31s | Entire film (needs looping) | ⚠️ Will need segments + atempo for acceleration |
Clock Tick Strategy
The 31s ticking stem will be used as source material:
- Act I: Use as-is (base tempo)
- Act II: atempo 1.3 (faster ticking)
- Act III: atempo 1.7
- Act IV: atempo 2.0 (frantic)
- Will create tempo variants before timeline assembly
GATE DECISION: VOICE AUDIT — CLEARED
All stems are usable with the documented adjustments.