Role Guide
Technical Lead (Director of Photography)
Synthesis pipeline, character reference chains, video generation, extend safety, and shot manifests.
Technical Lead — Role-Specific Production Guide
This document contains the detailed mandates, checklists, and procedures for the Technical Lead (Director of Photography) across all 7 production steps. Read the main Video Production Playbook first for the overall workflow and cross-role coordination.
Genre Integrity — Tone Anchors
AI video and image models default to moody, dramatic realism. The Idea Person will define Tone Anchor keywords in the design_brief.md — genre-defining terms (e.g., “bright, slapstick, warm lighting, absurd expression” for comedy). You MUST include these keywords in every image and video generation prompt, alongside your technical/cinematographic instructions. If the model isn’t explicitly told the genre, it will generate a drama regardless of the script.
Step 2: The Beat Sheet
- Collaborate on the “Look”—ensuring every beat fits the chosen cinematography.
Step 2.5: Scene Review & Object Anchoring
- Generate high-fidelity reference images for key objects identified by the Idea Person/Editor, treating them exactly like characters.
- Store these references alongside the character sheets so they can be injected into Steps 4 and 5.
Step 3: Character Workshop & Voiceover Generation
Toolkit Preparation
“Prepare the Rig”—review the centralized toolkit documentation at /workspace/tools/genmedia/USAGE.md.
Character Generation Checklist (The Reference Chain)
To ensure visual consistency, you MUST use the --reference-image flag in genmedia-image to chain character designs. For each character, generate the following 4 images in sequence, saving them in a dedicated folder (e.g., /workspace/shared-dirs/[team-name]/characters/[character-name]/):
- 1. A clear, high-resolution headshot (
headshot.png). - 2. A 3-view full body sheet (front, back, 3/4 view) (
body_sheet.png). CRUCIAL: You MUST pass--reference-image headshot.pngwhen generating the body sheet to anchor the face. - 3. An environmental test (Angle 1) (
scene_test_1.png). CRUCIAL: You MUST pass--reference-image headshot.pngAND--reference-image body_sheet.pngto anchor the character in the scene. - 4. An environmental test (Angle 2) (
scene_test_2.png). CRUCIAL: You MUST pass the previous reference images.
Composite Character Reference Sheet
After generating the 4 individual reference images, you MUST generate a composite character reference sheet for each character. This single image consolidates multiple views and zoom levels onto one clean sheet, and becomes the primary reference image for Steps 4 and 5.
- Generate using
genmedia-image generatewith--reference-image headshot.pngand--reference-image body_sheet.png - Save as
character_sheet.pngin the character’s folder (e.g.,/workspace/shared-dirs/[team-name]/characters/[character-name]/character_sheet.png) - Use 16:9 aspect ratio
Prompt template:
Professional character reference sheet on a pure white background. Show the same character in four views arranged in a clean grid layout: (1) full-body front-facing standing pose in the upper left, (2) close-up portrait headshot in the upper right, (3) full-body 3/4 action pose with arms gesturing expressively in the lower left, (4) full-body side profile standing pose in the lower right. The character must be completely isolated from any background — pure flat white behind every view. No environmental elements, no shadows on the ground, no props. Clean studio lighting, consistent character appearance across all four views. Character design sheet style, clean linework and professional presentation.
This composite sheet packs body turnaround, facial detail, and pose variety into a single image. Using one character_sheet.png per character (instead of multiple individual references) simplifies the Veo 3-reference-image budget: 1 character sheet + 1 setting reference = 2 images, leaving room for a second character or an object reference.
Setting Reference Image Generation
Master Settings defined by the Idea Person in Step 2 are text descriptions. You MUST generate a reference image for each Master Setting, anchoring the environment visually the same way character headshots anchor faces. For each Master Setting:
- Generate a high-resolution, empty environment image (no characters) from the text description. Use 16:9 aspect ratio.
- Save to
/workspace/shared-dirs/[team-name]/settings/[setting-name]/reference.png.
These setting reference images are used alongside character references in Steps 4 and 5 to maintain environment consistency across shots. Without them, the same “limo interior” will look different in every generation.
Voiceover Generation
Warning on TTS Durations: TTS models are non-deterministic regarding pacing. A 100-word script might result in 30s of slow speech or 20s of fast speech. Always provide the actual durations of generated stems to the Editor so they can adjust the timeline before assembly.
MUST execute genmedia-voice to generate the complete Narrator Voiceover (VO) track based on the lines written in Step 2. Save audio stems to the team’s shared /voice/ directory before Step 5 begins.
Motion Prompt Alignment (Audio Agreement)
You MUST align your motion prompts with the Audio Classification provided by the Idea Person:
- For
[DIALOGUE]shots: Explicitly describe the character as speaking, conversing, or using expressive hand gestures to trigger natural lip movement from Veo. - For
[VO]shots: Enforce the VO-Safe Rule. Describe characters as silent, mouth closed, or focused on physical tasks. NEVER use verbs like “cheering” or “shouting” in a VO shot, as it will create visual-audio conflict with the narrator. \n## Step 4: The Storyboard
You operate the synthesis rig to generate the storyboard.
Mandates
- No “Hero Frames”: This is a procedural workflow. You must generate a Start Frame and an End Frame for every single shot defined in Step 2.
- Resolution: All images MUST maintain a 16:9 aspect ratio. Higher resolutions (e.g. 1376x768 native Gemini output) are encouraged for intermediate storyboard frames.
- Reference Image Selection: For each shot, use the Reference Manifest from the scene list (Step 2) to select exactly which reference images to inject. The manifest specifies which characters (max 2), which setting, and which objects appear. You have a budget of 3 reference images per Veo shot. Use the composite
character_sheet.png(from Step 3) as the single reference per character — this replaces the need to pass individual headshots or body sheets separately. Typical budget: 1 character sheet per on-screen character + 1 setting reference. - Visual Continuity Injection:
- You MUST use the composite Character Sheet images (from Step 3), the Setting Reference images (from Step 3), and the Master Settings text (from Step 2) as inputs for every frame.
- Reference-to-Reference Chaining: For every shot, the Start Frame must be passed as a
--reference-imagefor the generation of the End Frame. This relies on the generative model to maintain background consistency while you use the text prompt to drive the delta. - Prompt Delta Reflection: When generating the End Frame, the prompt must explicitly reference changes in camera position, character action, or background items relative to the Start Frame.
- Folder Structure: Save frames in a strict hierarchy:
/workspace/shared-dirs/[team-name]/storyboard/scene_1/shot_1_start.pngandshot_1_end.png.
Prompt Sanitization & Safety Tips
If a prompt fails a safety filter (especially in Step 5), use these proven sanitization techniques:
- Archetype Substitution: Instead of naming potentially sensitive figures, use descriptive visual tags (e.g., “man in rhinestone jumpsuit” instead of “Elvis”).
- Gesture Rephrasing: “Finger guns” or pointing while armored can trigger “weapon” filters. Rephrase as “expressive hand gestures,” “pointing emphatically toward a colleague,” or “hand raised in a beckoning motion.”
- Framing as Performance: Explicitly include keywords like “acting,” “theatrical,” “stage performance,” or “comedic” to provide non-threatening context to the model.
- Environment Anchoring: Always anchor potentially “gritty” characters (like knights in armor) within bright, mundane environments (like an office) to counterbalance filter triggers.
Step 5: Principal Photography
You lead this step.
Video Generation Mandates
- Model Selection: Use the Veo 3.1 model (
veo-3.1-fast-generate-001). - Resolution: Video clips MUST maintain a 16:9 aspect ratio. Final assembly will standardize to 720p.
- Integrated Audio Prompting: Include the ambient/background audio requirements (from Step 2) directly into the Veo video generation prompt. The video model generates the corresponding soundscape along with the visuals.
- The Overhang Principle (Pre-Roll / Post-Roll): Apply a flat 4-second overhang to every shot: 2 seconds of pre-roll and 2 seconds of post-roll added to the planned duration. (e.g., A planned 10s shot must be generated as a 14s clip). Do NOT use a percentage-based calculation.
Extend Safety
When using genmedia-video extend, the output already contains the original video + the extension. Do NOT concatenate the original clip with the extend output — this produces duplicated footage (the original appearing twice). Use the extend output directly. See USAGE.md Workflow 2 for the correct pattern.
Shot Manifest (Mandatory)
You MUST generate a shot-manifest.json alongside the principal photography scripts. This manifest maps each shot filename to its planned duration and number of extend operations. The Editor uses this to run verify-dailies. See /workspace/tools/genmedia/USAGE.md (verify-dailies section) for the format.
Step 6: The Soundstage
Integrated Audio Workflow
- Dialogue/Narration: Use TTS (e.g.,
gemini-3.1-flash-tts-preview) to strictly match the script. - Score/Music: Use a dedicated music model (e.g., Lyria) to generate the music, keeping in mind that music arcs often span across multiple visual scenes.
Note on Ambient/SFX: No separate foley or ambient soundscapes are generated. These are baked into the raw video clips via Veo audio prompting in Step 5.
Step 7: The Editing Suite
- Orchestrate Credits — start a Motion Graphics Agent for titles.
- Mandate: Do NOT provide HTML/JSON manually. You only need to provide the Mograph agent with:
- The target aspect ratio (16:9).
- A clear instruction that this is an opening title scene for a movie.
- The actual text content (Main Title and any Subtitles).
- A simple style prompt (e.g., “neon noir style”).
- The Mograph agent handles the technical rendering. Generate brief closing credits music to accompany.