Step 3 — Voiceover & Dialogue Stem Audit
Reviewer: kappa-editor
Date: 2026-05-18
All stems are PCM 16-bit, 24kHz mono WAV — consistent format, good.
Stem Duration vs Shot Timing Window
| Shot | Audio Class | Shot Dur | Timing Window | Stem File | Stem Dur | Fit? |
|---|---|---|---|---|---|---|
| 1 | VO | 8s | 2s–8s (6s) | shot01_vo.wav | 12.6s | Trim needed |
| 1a | VO | 8s | 1s–7.5s (6.5s) | shot01a_vo.wav | 15.4s | Trim needed |
| 2 | DIALOGUE | 12s | 1s–9s (8s) | shot02_dialogue_stanton.wav | 13.6s | Trim needed |
| 4 | DIALOGUE | 12s | 1s–11s (10s) | shot04_dialogue_clippy.wav | 8.7s | ✅ Fits |
| 5 | DIALOGUE | 10s | 2s–7s (5s) | shot05_dialogue_highlighter.wav | 11.1s | Trim needed |
| 8 | DIALOGUE | 5s | 1s–4.5s (3.5s) | shot08_dialogue_stanton.wav | 12.4s | Heavy trim |
| 9a | DIALOGUE | 5s | 1s–4.5s (3.5s) | shot09a_dialogue_stanton.wav | 5.0s | Trim needed |
| 9b (S) | COMPOUND | 6s | 1s–3s (2s) | shot09b_dialogue_stanton.wav | 8.4s | Heavy trim |
| 9b (H) | COMPOUND | 6s | 3.5s–6s (2.5s) | shot09b_dialogue_highlighter.wav | 6.9s | Heavy trim |
| 9c | DIALOGUE | 4s | 1.5s–3.5s (2s) | shot09c_dialogue_stanton.wav | 3.4s | Trim needed |
| 10 | VO | 4s | 0.5s–3.5s (3s) | shot10_vo_stanton.wav | 5.1s | Trim needed |
| 10c (S) | COMPOUND | 5s | 0.5s–4s (3.5s) | shot10c_dialogue_stanton.wav | 7.8s | Heavy trim |
| 11 | DIALOGUE | 8s | 1s–7s (6s) | shot11_dialogue_clippy.wav | 8.9s | Trim needed |
| 12b | VO | 6s | 1s–5s (4s) | shot12b_vo.wav | 9.7s | Trim needed |
| 13 | DIALOGUE | 12s | 2s–9s (7s) | shot13_dialogue_stanton.wav | 13.5s | Trim needed |
| 14 | VO | 5s | 1s–4s (3s) | shot14_vo_stanton.wav | 5.1s | Trim needed |
| 15 | VO | 10s | 2s–8s (6s) | shot15_vo_stanton.wav | 3.9s | ✅ Fits |
| 19 | DIALOGUE | 8s | 1s–7s (6s) | shot19_dialogue_stanton.wav | 9.4s | Trim needed |
| 20 | DIALOGUE | 6s | 1s–5s (4s) | shot20_dialogue_highlighter.wav | 4.4s | Tight fit |
Trimming Strategy
Most stems have leading/trailing silence from TTS generation. Will use source_in/source_out in the timeline JSON to extract speech content. Not a blocker — standard post-production workflow.
Previously Missing Stems — RESOLVED
| Shot | Character | Line | Status |
|---|---|---|---|
| 10c | Clippy | Whimpering whine (4.2s–5s) | ✅ Delivered — 4.76s, Puck voice |
| 21 | Clippy | ”Hey, quick question… do paperclips get PTO? Because I think I’m stuck here.” | ✅ Delivered — 9.04s, Puck voice, exhausted/wry delivery |
Completeness Summary
- 21/21 stems delivered (100%) ✅
- Format consistency: ✅ All PCM s16le, 24kHz, mono
- VO INVENTORY COMPLETE. Step 3 audio gate: PASSED.