Cinematography Constraints — Step 2 Input
From: Marcus Delaney (Tech Lead)
For: Jules (Idea Person) — use alongside Beat Sheet construction
Date: 2026-05-18
1. Shot Duration Strategy (Veo Capabilities)
Veo generates clips at 4s, 6s, or 8s base duration. The extend command (Veo 3.1 Lite only) adds exactly 7s per extension. With the Overhang Principle (2s pre-roll + 2s post-roll = 4s total), plan like this:
| Planned Screen Time | Generation Strategy | Raw Clip | Notes |
|---|---|---|---|
| 2–4s | Generate 8s, trim in post | 8s | Most shots should be this — short, punchy |
| 5–6s | Generate 8s, trim to 6s usable | 8s | Standard dialogue beat |
| 7–8s | Generate 8s, trim minimally | 8s | Holding shots, silences |
| 9–11s | Generate 8s + 1 extend (= 15s raw), trim | 15s | Use sparingly — long takes |
| 12–14s | Generate 8s + 1 extend, trim tight | 15s | Max before 2nd extend |
Rule of thumb: Keep most shots in the 4–6s planned range (generated as 8s clips). This gives the Editor maximum trim flexibility and keeps generation costs low. Reserve longer shots (8s+ planned / extend needed) for key emotional beats.
For a 3:30–4:00 target runtime, we need roughly 40–55 seconds of usable footage per scene across 4 scenes. That’s approximately 8–10 shots per scene at 4–6s each.
2. Camera Coverage Plan (Mapped to Tempo Structure)
Scene 1 — ALLEGRETTO (The Polite Arrival) | ~50–60s
| Coverage Type | Usage | Technical Notes |
|---|---|---|
| Wide establishing | Exterior diner in rain, then interior geography | Setting reference only — no characters or one silhouette |
| Medium two-shot | Sarah approaches booth, sits down | Both characters, emphasize table distance |
| Medium OTS (over-the-shoulder) | Polite exchanges | Alternating — Sarah’s shoulder/Mark speaking, then reverse |
| Insert close-up | Manila envelope landing on table, coffee pouring | Props only — no character reference needed, frees budget |
Pacing: Measured cuts, 5–6s per shot. Let the geometry of the diner breathe.
Scene 2 — ACCELERANDO (The First Crack) | ~60–75s
| Coverage Type | Usage | Technical Notes |
|---|---|---|
| Medium close-up (MCU) | Argument exchanges | Tighter framing — single character per shot |
| Quick-cut MCU pairs | Rapid-fire overlapping dialogue | 4s shots, fast alternation |
| Two-shot (tighter) | Physical distance collapsing as they lean in | Both characters, table between |
| Insert | Hands slamming table, coffee cups rattling | Prop detail, no character ref needed |
Pacing: Shots shorten as tension builds. Start at 6s, compress to 4s at the argument peak.
Scene 3 — ADAGIO (The Vulnerable Center) | ~60–75s
| Coverage Type | Usage | Technical Notes |
|---|---|---|
| Extreme close-up (ECU) | Faces — tears, micro-expressions | Single character, slow push-in |
| ECU insert | Hands touching across table, fork breaking pie | Emotional props |
| Medium two-shot | Shared laughter, the memory moment | Both characters, warmer framing |
| Slow push-in | Camera creeping closer during vulnerability | Use from-image with start/end frames for control |
Pacing: Longest shots of the film. 6–8s holds. Let silences land. This is where we earn the audience.
Scene 4 — CODA (The Resolution) | ~35–50s
| Coverage Type | Usage | Technical Notes |
|---|---|---|
| ECU insert | Pen signing papers — the scratch sound | Prop only |
| Medium two-shot | Final exchange across table | Both characters, softer framing |
| Medium tracking | Sarah walks to door, bell chimes | Single character |
| Wide | Mark alone in booth, dawn light through window | Single character + setting, new lighting |
| Wide exterior | Taillights fading, rain stopped, purple sky | Setting only — bookend with opening |
Pacing: Deliberate, measured. 5–6s per shot. Mirror the opening tempo but with resolution.
3. Master Settings (for Setting Reference Images)
I will generate one reference image per Master Setting in Step 3. Define these in the scene list:
| Setting ID | Description | Used In |
|---|---|---|
diner-exterior | Starlight Diner exterior — rain, blue neon “OPEN 24 HOURS” sign, wet parking lot, nighttime | Scenes 1, 4 |
diner-booth | Interior booth — cherry red vinyl, chrome-edged Formica table, warm tungsten pendant light overhead, rain-streaked window beside booth, black-and-white checkered floor visible | Scenes 1, 2, 3, 4 |
diner-booth-dawn | Same booth but dawn light replaces rain — purple/amber sky through window, neon sign off, softer warmer light | Scene 4 (final shots) |
Note: diner-booth is our primary setting — 80%+ of shots will use this reference. The dawn variant signals temporal resolution without requiring a new location.
4. Reference Manifest Budget (Per Shot)
Veo allows max 3 reference images per shot. Our standard allocation:
| Slot | Content | When to Use |
|---|---|---|
| Ref 1 | Sarah’s character_sheet.png | Any shot featuring Sarah |
| Ref 2 | Mark’s character_sheet.png | Any shot featuring Mark |
| Ref 3 | Setting reference (diner-booth etc.) | All shots |
Special cases:
- Insert/prop shots (hands, coffee, envelope): 0–1 character refs + setting ref. Frees budget.
- Single-character shots: 1 character sheet + setting ref = 2 refs. Room for a prop reference if needed.
- Establishing exterior: Setting ref only. No character refs needed.
5. Audio Classification Guidance (Dialogue-Heavy Mandate)
This pilot tests dialogue. Here’s how each classification maps to our cinematography:
[DIALOGUE] — Characters speaking on screen
- Camera: MCU or tighter on the speaking character. Mouth must be visible.
- Motion prompt: MUST include “character speaking,” “talking,” “animated conversation,” or “expressive gestures while talking.” This triggers Veo lip-sync.
- Audio: Veo generates native dialogue audio. We do NOT overlay TTS on these shots.
- Lip-sync rule: This is our primary mandate. Most shots should be [DIALOGUE].
[VO] — Narrator speaks over visuals
- Camera: Inserts, establishing shots, or characters doing NON-speaking actions (stirring coffee, looking out window, walking).
- Motion prompt: MUST enforce VO-Safe Rule. Use “mouth closed,” “silent contemplation,” “gazing,” “walking silently.” NEVER “cheering,” “shouting,” “talking.”
- Audio: TTS voiceover generated separately and overlaid in post.
- Usage: Opening/closing moments, transitions, emotional punctuation. Keep minimal — this is a dialogue film.
[COMPOUND] — Two characters speak sequentially in one shot
- Camera: Two-shot where both characters are visible and take turns speaking.
- Motion prompt: “Two people in animated conversation at a diner booth, taking turns speaking, expressive hand gestures.”
- Audio: Veo generates the audio with both voices.
- Mandate: We need at least 2–3 of these per the checklist.
[SILENT] — No speech, ambient only
- Camera: Inserts (pie, envelope, rain), establishing shots, reaction holds.
- Motion prompt: Pure visual action. “Rain streaking down window,” “hand sliding envelope across table.”
- Audio: Ambient diner sounds baked in by Veo.
Recommended Distribution for “The Last Diner”
| Type | Target Count | % of Shots |
|---|---|---|
| [DIALOGUE] | 18–22 shots | ~55% |
| [COMPOUND] | 3–4 shots | ~10% |
| [VO] | 3–5 shots | ~10% |
| [SILENT] | 6–8 shots | ~20% |
This gives us a dialogue-dominant film (65% speech on screen) while maintaining visual variety through inserts and establishing shots.
6. Dialogue Content Guidance
For [DIALOGUE] and [COMPOUND] shots, the dialogue lines in the scene list are Veo prompting material — they tell the model what the characters should say, and Veo generates the audio natively with lip-sync. Keep lines:
- Short per shot: 1–3 sentences max. Veo handles short exchanges better than monologues.
- Distinct per character: Sarah is clipped, precise, uses full sentences. Mark is looser, interrupts, uses fragments and rhetorical questions. This helps Veo differentiate voices.
- Emotionally tagged: Include tone direction in the motion prompt (e.g., “speaking sharply,” “whispering tenderly,” “laughing while talking”). Veo uses this for vocal performance.
For longer exchanges, split across multiple shots with alternating coverage (MCU on Sarah → MCU on Mark → two-shot). This is standard dialogue coverage and plays to Veo’s strengths with single-character lip-sync.
Jules — this is your cinematography toolkit for the Beat Sheet. Use the camera coverage plan to assign shot types, the duration strategy to set planned times, and the audio classification guide to tag every shot. I’ll validate the Look once the scene list draft is ready.