Audio Strategy Pivot — Impact Audit (RESOLVED: REVERTED)
Author: kappa-editor
Date: 2026-05-18
Trigger: Coordinator directive via Coach — use Veo native audio for character dialogue, keep TTS for narrator VO only.
Stem Classification: KEEP vs DISCARD
KEEP — Narrator [VO] (no visible speaking, TTS stems remain valid)
| Stem | Shot | Audio Class | Why Keep |
|---|---|---|---|
| shot01_vo.wav | 1 | [VO] | Narrator over desk B-roll. No characters visible. |
| shot01a_vo.wav | 1a | [VO] | Narrator over desk B-roll. No characters visible. |
| shot10_vo_stanton.wav | 10 | [VO] | Stanton VO over Clippy action montage. Clippy visible but NOT speaking. |
| shot12b_vo.wav | 12b | [VO] | Narrator over desk wide shot. No characters visible. |
| shot14_vo_stanton.wav | 14 | [VO] | Stanton VO over crisis B-roll. No characters visible. |
| shot15_vo_stanton.wav | 15 | [VO] | Stanton VO over Clippy’s leap. Clippy visible but NOT speaking. |
Total KEEP: 6 stems
DISCARD — Character [DIALOGUE] (visible speaking → Veo native audio)
| Stem | Shot | Audio Class | Character Speaking |
|---|---|---|---|
| shot02_dialogue_stanton.wav | 2 | [DIALOGUE] | Stanton interview, appears to speak |
| shot04_dialogue_clippy.wav | 4 | [DIALOGUE] | Clippy interview, appears to speak |
| shot05_dialogue_highlighter.wav | 5 | [DIALOGUE] | Highlighter interview, appears to speak |
| shot08_dialogue_stanton.wav | 8 | [DIALOGUE] | Stanton B-roll, appears to speak |
| shot09a_dialogue_stanton.wav | 9a | [DIALOGUE] | Stanton B-roll, appears to speak |
| shot09c_dialogue_stanton.wav | 9c | [DIALOGUE] | Stanton B-roll, appears to speak |
| shot11_dialogue_clippy.wav | 11 | [DIALOGUE] | Clippy interview, appears to speak |
| shot13_dialogue_stanton.wav | 13 | [DIALOGUE] | Stanton interview, appears to speak |
| shot19_dialogue_stanton.wav | 19 | [DIALOGUE] | Stanton interview, appears to speak |
| shot20_dialogue_highlighter.wav | 20 | [DIALOGUE] | Highlighter interview, appears to speak |
| shot21_dialogue_clippy.wav | 21 | [DIALOGUE] | Clippy hybrid interview, appears to speak |
Total DISCARD: 11 stems
COMPOUND — Needs Special Handling
| Stems | Shot | Issue |
|---|---|---|
| shot09b_dialogue_stanton.wav + shot09b_dialogue_highlighter.wav | 9b | Both characters visible and speaking sequentially. Veo must generate both voices in one clip. |
| shot10c_dialogue_stanton.wav + shot10c_dialogue_clippy.wav | 10c | Both characters visible. Stanton speaks, Clippy whimpers. Veo must handle both. |
Total DISCARD (compound): 4 stems
Grand total: 6 KEEP, 15 DISCARD
Timeline Impact — Actually Cleaner
The design_brief.md editorial guardrails already specified: “complete, awkward silence during interviews (save for the low hum of the HVAC system).” This means no music during interview/dialogue shots was always the plan. The pivot aligns perfectly:
New Audio Architecture
| Shot Type | Video Audio | Voice Track (TTS) | Music Track |
|---|---|---|---|
| [VO] | --generate-audio=false (no Veo audio) | ✅ TTS stem placed on timeline | ✅ Music plays, ducked under voice |
| [DIALOGUE] | ✅ Veo native audio (embedded dialogue) | ❌ None | ❌ No music (documentary silence) |
| [COMPOUND] | ✅ Veo native audio (multi-character) | ❌ None | ❌ No music |
| [SILENT] | ✅ Veo native audio (ambient/foley) | ❌ None | ✅ Music at full level |
Why This Is Actually Better
- No ducking conflict. Music only plays during [VO] and [SILENT] shots. The voice track (TTS stems) is the only duck key signal. Clean separation.
- Authentic mockumentary feel. Real documentaries cut music during talking-head interviews. The silence makes the deadpan comedy land harder.
- Simpler timeline. Voice track has 6 clips instead of 21. Music track only needs to cover ~90s of VO/SILENT footage, not the full runtime.
- Natural audio transitions. Veo-generated ambient (HVAC hum, room tone) in dialogue clips provides organic sound beds. No need to manufacture silence.
One Critical Concern
Veo voice consistency. The DP must ensure Veo generates consistent character voices across all dialogue shots. Stanton should always sound like a grizzled baritone. Clippy should always sound nervous. Highlighter should always sound cynical and raspy. This needs explicit prompting in Step 5.
Suggestion for DP: Include detailed voice direction in every Veo dialogue prompt:
- Stanton: “Deep, gravelly baritone. Corporate jargon delivered with military seriousness.”
- Clippy: “High-pitched, trembling, anxious. Rapid speech with nervous pauses.”
- Highlighter: “Slow, raspy drawl. Cynical and utterly disinterested.”
RESOLUTION
Coach ruling (04:59 UTC): REVERTED. DP correctly identified that our claymation characters (stapler, paperclip, highlighter) have no lips — lip-sync is physically impossible. The coordinator’s directive does not apply to our project.
Final strategy: Original plan stands.
- All 21 TTS stems KEPT
- All video generated with
--generate-audio=false - All dialogue/VO overlaid in post via timeline JSON
- Voice casting (Fenrir, Puck, etc.) preserved
- Timeline plan unchanged