Audio Strategy Pivot — Impact Audit (RESOLVED: REVERTED)

Author: kappa-editor
Date: 2026-05-18
Trigger: Coordinator directive via Coach — use Veo native audio for character dialogue, keep TTS for narrator VO only.

Stem Classification: KEEP vs DISCARD

KEEP — Narrator [VO] (no visible speaking, TTS stems remain valid)

Stem	Shot	Audio Class	Why Keep
shot01_vo.wav	1	[VO]	Narrator over desk B-roll. No characters visible.
shot01a_vo.wav	1a	[VO]	Narrator over desk B-roll. No characters visible.
shot10_vo_stanton.wav	10	[VO]	Stanton VO over Clippy action montage. Clippy visible but NOT speaking.
shot12b_vo.wav	12b	[VO]	Narrator over desk wide shot. No characters visible.
shot14_vo_stanton.wav	14	[VO]	Stanton VO over crisis B-roll. No characters visible.
shot15_vo_stanton.wav	15	[VO]	Stanton VO over Clippy’s leap. Clippy visible but NOT speaking.

Total KEEP: 6 stems

DISCARD — Character [DIALOGUE] (visible speaking → Veo native audio)

Stem	Shot	Audio Class	Character Speaking
shot02_dialogue_stanton.wav	2	[DIALOGUE]	Stanton interview, appears to speak
shot04_dialogue_clippy.wav	4	[DIALOGUE]	Clippy interview, appears to speak
shot05_dialogue_highlighter.wav	5	[DIALOGUE]	Highlighter interview, appears to speak
shot08_dialogue_stanton.wav	8	[DIALOGUE]	Stanton B-roll, appears to speak
shot09a_dialogue_stanton.wav	9a	[DIALOGUE]	Stanton B-roll, appears to speak
shot09c_dialogue_stanton.wav	9c	[DIALOGUE]	Stanton B-roll, appears to speak
shot11_dialogue_clippy.wav	11	[DIALOGUE]	Clippy interview, appears to speak
shot13_dialogue_stanton.wav	13	[DIALOGUE]	Stanton interview, appears to speak
shot19_dialogue_stanton.wav	19	[DIALOGUE]	Stanton interview, appears to speak
shot20_dialogue_highlighter.wav	20	[DIALOGUE]	Highlighter interview, appears to speak
shot21_dialogue_clippy.wav	21	[DIALOGUE]	Clippy hybrid interview, appears to speak

Total DISCARD: 11 stems

COMPOUND — Needs Special Handling

Stems	Shot	Issue
shot09b_dialogue_stanton.wav + shot09b_dialogue_highlighter.wav	9b	Both characters visible and speaking sequentially. Veo must generate both voices in one clip.
shot10c_dialogue_stanton.wav + shot10c_dialogue_clippy.wav	10c	Both characters visible. Stanton speaks, Clippy whimpers. Veo must handle both.

Total DISCARD (compound): 4 stems

Grand total: 6 KEEP, 15 DISCARD

Timeline Impact — Actually Cleaner

The design_brief.md editorial guardrails already specified: “complete, awkward silence during interviews (save for the low hum of the HVAC system).” This means no music during interview/dialogue shots was always the plan. The pivot aligns perfectly:

New Audio Architecture

Shot Type	Video Audio	Voice Track (TTS)	Music Track
[VO]	`--generate-audio=false` (no Veo audio)	✅ TTS stem placed on timeline	✅ Music plays, ducked under voice
[DIALOGUE]	✅ Veo native audio (embedded dialogue)	❌ None	❌ No music (documentary silence)
[COMPOUND]	✅ Veo native audio (multi-character)	❌ None	❌ No music
[SILENT]	✅ Veo native audio (ambient/foley)	❌ None	✅ Music at full level

Why This Is Actually Better

No ducking conflict. Music only plays during [VO] and [SILENT] shots. The voice track (TTS stems) is the only duck key signal. Clean separation.
Authentic mockumentary feel. Real documentaries cut music during talking-head interviews. The silence makes the deadpan comedy land harder.
Simpler timeline. Voice track has 6 clips instead of 21. Music track only needs to cover ~90s of VO/SILENT footage, not the full runtime.
Natural audio transitions. Veo-generated ambient (HVAC hum, room tone) in dialogue clips provides organic sound beds. No need to manufacture silence.

One Critical Concern

Veo voice consistency. The DP must ensure Veo generates consistent character voices across all dialogue shots. Stanton should always sound like a grizzled baritone. Clippy should always sound nervous. Highlighter should always sound cynical and raspy. This needs explicit prompting in Step 5.

Suggestion for DP: Include detailed voice direction in every Veo dialogue prompt:

Stanton: “Deep, gravelly baritone. Corporate jargon delivered with military seriousness.”
Clippy: “High-pitched, trembling, anxious. Rapid speech with nervous pauses.”
Highlighter: “Slow, raspy drawl. Cynical and utterly disinterested.”

RESOLUTION

Coach ruling (04:59 UTC): REVERTED. DP correctly identified that our claymation characters (stapler, paperclip, highlighter) have no lips — lip-sync is physically impossible. The coordinator’s directive does not apply to our project.

Final strategy: Original plan stands.

All 21 TTS stems KEPT
All video generated with --generate-audio=false
All dialogue/VO overlaid in post via timeline JSON
Voice casting (Fenrir, Puck, etc.) preserved
Timeline plan unchanged

Audio Pivot Audit