Step 5: Principal Photography — Generation Strategy

Prepared by: Marcus Delaney (Tech Lead)
Status: DRAFT — pending Step 4 sign-off

Overview

28 shots, 208s planned runtime. All shots will use genmedia-video from-frames (storyboard start + end frame interpolation) with Veo 3.1. This leverages the approved storyboard frames as keyframes for visual consistency.

Model Selection Matrix

Audio Class	Model	Audio Generation	Rationale
[DIALOGUE]	`veo-3.1-generate-001`	`--generate-audio=true`	Native lip-sync for speaking characters
[COMPOUND]	`veo-3.1-generate-001`	`--generate-audio=true`	Native lip-sync for two sequential speakers
[VO]	`veo-3.1-generate-001`	`--generate-audio=false`	VO overlay in post; VO-Safe rule applies
[SILENT]	`veo-3.1-generate-001`	`--generate-audio=false`	Ambient only, added in post

Duration Strategy (Overhang Principle)

All base clips generated at 8s (maximum for Veo). Clips requiring >8s use veo-3.1-lite-generate-001 extend operations.

Shot	Planned	Audio Class	Base Clip	Extends	Raw Output	Trim Target
1.1	10s*	[VO]	8s	0	8s	8s (VO 9.36s, Editor trims)
1.2	6s	[SILENT]	8s	0	8s	6s + 2s overhang
1.3	8s	[SILENT]	8s	0	8s	full clip
1.4	6s	[SILENT]	8s	0	8s	6s + 2s overhang
1.5	10s	[COMPOUND]	8s	1	15s	10s + 4s overhang
1.6	6s	[DIALOGUE]	8s	0	8s	6s + 2s overhang
1.7	6s	[DIALOGUE]	8s	0	8s	6s + 2s overhang
2.1	10s	[COMPOUND]	8s	1	15s	10s + 4s overhang
2.2	6s	[DIALOGUE]	8s	0	8s	6s + 2s overhang
2.3	6s	[DIALOGUE]	8s	0	8s	6s + 2s overhang
2.4	6s	[DIALOGUE]	8s	0	8s	6s + 2s overhang
2.5	6s	[DIALOGUE]	8s	0	8s	6s + 2s overhang
2.6	6s	[DIALOGUE]	8s	0	8s	6s + 2s overhang
2.7	6s	[DIALOGUE]	8s	0	8s	6s + 2s overhang
2.8	6s	[SILENT]	8s	0	8s	6s + 2s overhang
2.9	6s	[SILENT]	8s	0	8s	6s + 2s overhang
3.1	8s	[DIALOGUE]	8s	0	8s	full clip
3.2	8s	[DIALOGUE]	8s	0	8s	full clip
3.3	8s	[DIALOGUE]	8s	0	8s	full clip
3.4	10s	[COMPOUND]	8s	1	15s	10s + 4s overhang
3.5	10s	[SILENT]	8s	1	15s	10s + 4s overhang
3.6	8s	[DIALOGUE]	8s	0	8s	full clip
4.1	8s	[SILENT]	8s	0	8s	full clip
4.2	8s	[SILENT]	8s	0	8s	full clip
4.3	6s	[DIALOGUE]	8s	0	8s	6s + 2s overhang
4.4	6s	[DIALOGUE]	8s	0	8s	6s + 2s overhang
4.5	8s	[VO]	8s	0	8s	full clip
4.6	12s	[SILENT]	8s	1	15s	12s + 3s overhang

*Shot 1.1 extended to 10s to accommodate 9.36s narrator VO.

Extend Operations (5 shots)

Shot	Planned	Strategy
1.5	10s	8s base + 1 extend (15s raw) → trim to ~14s
2.1	10s	8s base + 1 extend (15s raw) → trim to ~14s
3.4	10s	8s base + 1 extend (15s raw) → trim to ~14s
3.5	10s	8s base + 1 extend (15s raw) → trim to ~14s
4.6	12s	8s base + 1 extend (15s raw) → trim to ~16s

Reference Image Budget Per Shot

Using from-frames, the start and end frames serve as the primary visual anchor. The motion prompt guides the interpolation. Character sheets and setting references are NOT needed for from-frames — the storyboard frames already encode character appearance and setting.

If from-frames produces poor results on specific shots, fallback to genmedia-video generate with:

Slot 1: Character sheet (Sarah or Mark)
Slot 2: Character sheet (second character, if two-shot)
Slot 3: Setting reference (diner-booth or diner-booth-dawn)

Audio Treatment Pipeline

[DIALOGUE] Shots (14 shots)

Include dialogue text in motion prompt to trigger Veo native lip-sync
Include speaker action descriptions (e.g., “speaking firmly”, “whispering tenderly”)
No TTS overlay needed — Veo generates lip-synced audio natively

[COMPOUND] Shots (3 shots: 1.5, 2.1, 3.4)

Include BOTH speakers’ dialogue in motion prompt, in sequence
Veo generates lip-sync for sequential speakers
These are all 10s shots requiring extend operations

[VO] Shots (2 shots: 1.1, 4.5)

Generate video WITHOUT audio (--generate-audio=false)
VO-Safe: No speaking actions in prompt, mouths closed
Overlay pre-generated TTS stems in post:
- 1.1: voice/lambda-narrator-sh1_1.wav (Fenrir, 9.36s)
- 4.5: Need to generate Sarah’s VO (voice/lambda-sarah-offscreen-sh4_5.wav)

[SILENT] Shots (9 shots)

Generate video WITHOUT audio (--generate-audio=false)
Ambient/SFX added in post via score or foley

Missing Audio Assets

Shot	Need	Voice	Text
4.5	Sarah offscreen VO	TBD (Sarah’s voice)	“Take care of yourself, Mark.”

Generation Order

Batch by extend requirement:

Non-extend shots (23 shots): Generate via from-frames, 8s base clips
Extend shots (5 shots): Generate 8s base, upload to GCS, extend via Veo 3.1 Lite
Verify all: Run verify-dailies with shot manifest

Post-Generation Gates

genmedia-verify check-shots --dir ./dailies --min-duration 3 — basic quality check
verify-dailies --dir ./dailies --manifest shot-manifest.json — duration + extend validation
Visual review of character consistency against character sheets
Editor’s dailies review

This strategy will be finalized after Step 4 sign-off.

Step 5: Generation Strategy