Step 5: Principal Photography — Generation Strategy
Prepared by: Marcus Delaney (Tech Lead)
Status: DRAFT — pending Step 4 sign-off
Overview
28 shots, 208s planned runtime. All shots will use genmedia-video from-frames (storyboard start + end frame interpolation) with Veo 3.1. This leverages the approved storyboard frames as keyframes for visual consistency.
Model Selection Matrix
| Audio Class | Model | Audio Generation | Rationale |
|---|---|---|---|
| [DIALOGUE] | veo-3.1-generate-001 | --generate-audio=true | Native lip-sync for speaking characters |
| [COMPOUND] | veo-3.1-generate-001 | --generate-audio=true | Native lip-sync for two sequential speakers |
| [VO] | veo-3.1-generate-001 | --generate-audio=false | VO overlay in post; VO-Safe rule applies |
| [SILENT] | veo-3.1-generate-001 | --generate-audio=false | Ambient only, added in post |
Duration Strategy (Overhang Principle)
All base clips generated at 8s (maximum for Veo). Clips requiring >8s use veo-3.1-lite-generate-001 extend operations.
| Shot | Planned | Audio Class | Base Clip | Extends | Raw Output | Trim Target |
|---|---|---|---|---|---|---|
| 1.1 | 10s* | [VO] | 8s | 0 | 8s | 8s (VO 9.36s, Editor trims) |
| 1.2 | 6s | [SILENT] | 8s | 0 | 8s | 6s + 2s overhang |
| 1.3 | 8s | [SILENT] | 8s | 0 | 8s | full clip |
| 1.4 | 6s | [SILENT] | 8s | 0 | 8s | 6s + 2s overhang |
| 1.5 | 10s | [COMPOUND] | 8s | 1 | 15s | 10s + 4s overhang |
| 1.6 | 6s | [DIALOGUE] | 8s | 0 | 8s | 6s + 2s overhang |
| 1.7 | 6s | [DIALOGUE] | 8s | 0 | 8s | 6s + 2s overhang |
| 2.1 | 10s | [COMPOUND] | 8s | 1 | 15s | 10s + 4s overhang |
| 2.2 | 6s | [DIALOGUE] | 8s | 0 | 8s | 6s + 2s overhang |
| 2.3 | 6s | [DIALOGUE] | 8s | 0 | 8s | 6s + 2s overhang |
| 2.4 | 6s | [DIALOGUE] | 8s | 0 | 8s | 6s + 2s overhang |
| 2.5 | 6s | [DIALOGUE] | 8s | 0 | 8s | 6s + 2s overhang |
| 2.6 | 6s | [DIALOGUE] | 8s | 0 | 8s | 6s + 2s overhang |
| 2.7 | 6s | [DIALOGUE] | 8s | 0 | 8s | 6s + 2s overhang |
| 2.8 | 6s | [SILENT] | 8s | 0 | 8s | 6s + 2s overhang |
| 2.9 | 6s | [SILENT] | 8s | 0 | 8s | 6s + 2s overhang |
| 3.1 | 8s | [DIALOGUE] | 8s | 0 | 8s | full clip |
| 3.2 | 8s | [DIALOGUE] | 8s | 0 | 8s | full clip |
| 3.3 | 8s | [DIALOGUE] | 8s | 0 | 8s | full clip |
| 3.4 | 10s | [COMPOUND] | 8s | 1 | 15s | 10s + 4s overhang |
| 3.5 | 10s | [SILENT] | 8s | 1 | 15s | 10s + 4s overhang |
| 3.6 | 8s | [DIALOGUE] | 8s | 0 | 8s | full clip |
| 4.1 | 8s | [SILENT] | 8s | 0 | 8s | full clip |
| 4.2 | 8s | [SILENT] | 8s | 0 | 8s | full clip |
| 4.3 | 6s | [DIALOGUE] | 8s | 0 | 8s | 6s + 2s overhang |
| 4.4 | 6s | [DIALOGUE] | 8s | 0 | 8s | 6s + 2s overhang |
| 4.5 | 8s | [VO] | 8s | 0 | 8s | full clip |
| 4.6 | 12s | [SILENT] | 8s | 1 | 15s | 12s + 3s overhang |
*Shot 1.1 extended to 10s to accommodate 9.36s narrator VO.
Extend Operations (5 shots)
| Shot | Planned | Strategy |
|---|---|---|
| 1.5 | 10s | 8s base + 1 extend (15s raw) → trim to ~14s |
| 2.1 | 10s | 8s base + 1 extend (15s raw) → trim to ~14s |
| 3.4 | 10s | 8s base + 1 extend (15s raw) → trim to ~14s |
| 3.5 | 10s | 8s base + 1 extend (15s raw) → trim to ~14s |
| 4.6 | 12s | 8s base + 1 extend (15s raw) → trim to ~16s |
Reference Image Budget Per Shot
Using from-frames, the start and end frames serve as the primary visual anchor. The motion prompt guides the interpolation. Character sheets and setting references are NOT needed for from-frames — the storyboard frames already encode character appearance and setting.
If from-frames produces poor results on specific shots, fallback to genmedia-video generate with:
- Slot 1: Character sheet (Sarah or Mark)
- Slot 2: Character sheet (second character, if two-shot)
- Slot 3: Setting reference (diner-booth or diner-booth-dawn)
Audio Treatment Pipeline
[DIALOGUE] Shots (14 shots)
- Include dialogue text in motion prompt to trigger Veo native lip-sync
- Include speaker action descriptions (e.g., “speaking firmly”, “whispering tenderly”)
- No TTS overlay needed — Veo generates lip-synced audio natively
[COMPOUND] Shots (3 shots: 1.5, 2.1, 3.4)
- Include BOTH speakers’ dialogue in motion prompt, in sequence
- Veo generates lip-sync for sequential speakers
- These are all 10s shots requiring extend operations
[VO] Shots (2 shots: 1.1, 4.5)
- Generate video WITHOUT audio (
--generate-audio=false) - VO-Safe: No speaking actions in prompt, mouths closed
- Overlay pre-generated TTS stems in post:
- 1.1:
voice/lambda-narrator-sh1_1.wav(Fenrir, 9.36s) - 4.5: Need to generate Sarah’s VO (
voice/lambda-sarah-offscreen-sh4_5.wav)
- 1.1:
[SILENT] Shots (9 shots)
- Generate video WITHOUT audio (
--generate-audio=false) - Ambient/SFX added in post via score or foley
Missing Audio Assets
| Shot | Need | Voice | Text |
|---|---|---|---|
| 4.5 | Sarah offscreen VO | TBD (Sarah’s voice) | “Take care of yourself, Mark.” |
Generation Order
Batch by extend requirement:
- Non-extend shots (23 shots): Generate via
from-frames, 8s base clips - Extend shots (5 shots): Generate 8s base, upload to GCS, extend via Veo 3.1 Lite
- Verify all: Run
verify-dailieswith shot manifest
Post-Generation Gates
genmedia-verify check-shots --dir ./dailies --min-duration 3— basic quality checkverify-dailies --dir ./dailies --manifest shot-manifest.json— duration + extend validation- Visual review of character consistency against character sheets
- Editor’s dailies review
This strategy will be finalized after Step 4 sign-off.