← Back to Documentation
7 Steps

Video Production Playbook

The 7-step operational manual for moving from a blank page to a finished cinematic short. Includes genre discipline, visual-audio agreement rules, and verification checklists.

1

Global Technical Constraint

All visual assets (storyboard frames and video clips) MUST maintain a strict 16:9 aspect ratio. Intermediate assets (character references, storyboard frames) MAY be generated at higher-than-720p resolutions to improve model consistency. The final video deliverable MUST be strictly 1280x720.

2

Genre Discipline

Generative AI models heavily default to moody, dramatic, cinematic realism. Without active intervention, every film drifts toward noir/thriller regardless of intended genre. All three roles share responsibility for fighting this drift. The Idea Person owns genre, but the Technical Lead must encode it into every prompt and the Editor must not undermine it with pacing or sound design choices. See the role-specific guides for detailed genre integrity strategies.

3

Visual-Audio Agreement

Viewer immersion depends on alignment between what is seen and what is heard. Characters appearing to speak MUST have a corresponding dialogue track, and narrator VO MUST play over shots where no one appears to be talking. Use the Shot Audio Classification system (Step 2) to enforce this alignment.

4

Role-Specific Guides

Each role has a detailed adjunct document with mandates, checklists, and procedures. Read your role guide alongside this playbook:

  • Idea Person (Creative Director) — narrative, scripting, character profiles, executive producer review.
  • Technical Lead (Director of Photography) — synthesis pipeline, character reference chains, video generation, extend safety, shot manifest.
  • Editor (Post-Production Lead) — pacing review, duration verification, timeline JSON, final assembly, audio mixing.
5

Step 1: Concept & Idea (The Treatment)

Establish the narrative "soul" (via a full literary short story) and visual "DNA" (via a design brief). The Idea Person pitches 3 distinct story "sparks" and writes a full 2,000-3,000 word prose short story (no screenplay formatting). The Tech Lead vets concepts for "generatability." The Editor ensures the concept allows for compelling rhythmic editing.

  • Output: Finalized high_concept.md (including full short story) and design_brief.md (palettes, lighting, textures).
6

Step 2: The Beat Sheet (Scripting)

Deconstruct the short story into a detailed, two-layer roadmap (Scenes and Shots) for a full 3-5 minute film. The Idea Person creates scene_list.md with master settings, scene narrative, and per-shot definitions (action, duration, camera, dialogue, VO, audio classification, timing hints, reference manifest). Every shot is tagged as [DIALOGUE], [VO], [COMPOUND], or [SILENT]. No [VO] shot should have a motion prompt implying characters are speaking. Max 2 characters per shot. The Editor performs a Mathematical Pacing Review.

  • Genre gate — "Blind Watch" check: Before finalizing, ask "If someone reads this script cold, what genre would they think it is?" Revise if it doesn't match.
  • Output: Finalized scene_list.md with master settings, dialogue, musical arcs, and mathematically verified runtime.
7

Step 2.5: Scene Review & Object Anchoring (Continuity Pass)

Identify key objects and structures that span multiple shots, and establish them as visual anchors to prevent "machine drift" or geometric inconsistencies. The Idea Person/Editor review the shot list, identify recurring objects, and annotate with reference image mappings. Veo limit: max 3 reference images per shot. The Technical Lead generates high-fidelity reference images for key objects, stored alongside character sheets.

  • Output: An updated Shot List with explicit reference image mappings, and generated reference sheets for key objects/structures.
8

Step 3: Character Workshop, Media Prep, & Voiceover Generation

Prepare the centralized toolkit, cast the film's visual identity, and lock the narrator track. This step runs concurrently. The Idea Person writes rich background bios for each character. The Technical Lead generates the 4-image reference chain per character, a composite character reference sheet per character, setting reference images for each Master Setting, and all narrator voiceover stems. The Editor monitors visual texture across planned cuts.

  • Character Reference Chain: headshot.png, body_sheet.png, scene_test_1.png, scene_test_2.png + composite character_sheet.png.
  • Output: Character Look-Book (bios + 4 references + 1 composite per character), Setting Reference images, and completed Narrator Voiceover stems.
9

Step 4: The Storyboard (Procedural Bookends & Motion Prompts)

Create definitive visual benchmarks, temporal structure, and motion prompts for every shot. The Idea Person defines the explicit Video Model Motion Prompt for every shot. The Technical Lead generates Start Frame + End Frame for every shot using reference chaining. All images must be 16:9. Character Reference images MUST be used as inputs for continuity. The Editor reviews for Continuity and Traversability — can the video model bridge each shot's start/end frames?

  • Output: A high-fidelity, two-frame-per-shot storyboard mapped to a detailed Shot List with motion prompts.
10

Step 5: Principal Photography (Video Generation)

Film the scenes by moving from static images to cinematic motion. Use Veo 3.1 (veo-3.1-fast-generate-001) at 16:9 / 720p. Apply the Overhang Principle: apply a flat 4-second overhang to every shot (2s pre-roll, 2s post-roll). The Technical Lead executes video synthesis AND starts the Motion Graphics agent for titles/credits. The Editor runs the Takes Protocol (up to 2 reshoots per shot) and the mandatory verify-dailies duration gate.

  • Critical gates: verify-dailies --dir ./dailies passes anomaly scan; verify-dailies --dir ./dailies --manifest shot-manifest.json passes with zero failures.
  • Output: Verified, high-fidelity raw video clips for every scene/shot, accompanied by a passing verify-dailies report.
11

Step 6: The Soundstage (Audio & Scoring)

Build the emotional layer through sound. The Technical Lead generates dialogue/narration via TTS and score/music via Lyria. Ambient/SFX are baked into clips from Step 5. The Editor creates the Timeline JSON — the single source of truth for assembly. The Idea Person verifies audio stems align with dialogue and cues from Step 2.

  • Output: Library of original music and dialogue stems, along with an explicit timeline.json.
12

Step 7: The Editing Suite (Final Assembly)

Weave raw footage and score into a finished cinematic short. The Editor leads the Final Cut via genmedia-assemble timeline. CRITICAL: Ensure your working directory is the shared team folder so relative paths resolve correctly. Handles transitions, VO sync, iterative rough cuts. The Idea Person acts as Executive Producer — reviews the Rough Cut, may request up to 3 iterations. The Technical Lead performs final technical verification (ffprobe) of the master render.

  • Genre gate — "Blind Watch" check: Before accepting the Rough Cut, ask "If someone watches this with no prior context, what genre would they think it is?"
  • Output: A complete, polished 3-5 minute film (>50MB) and a final Production Manifest.
13

Appendix: The Hard Verification Checklist

The Idea Person (Executive Producer) and the Pilot Coach MUST strictly verify the following criteria before accepting any production master:

  • Genre — "Blind Watch" Check: Would a viewer identify the intended genre without context?
  • Dailies Duration Gate: Did verify-dailies pass with zero failures before clips entered the timeline?
  • Resolution Mandate: Is the final output exactly 1280x720 (720p)?
  • Duration Mandate: Is the total runtime strictly between 3:00 and 5:00 minutes?
  • Narrative Sufficiency: Does every scene contain meaningful storytelling progression?
  • Narrative Carrying: Does the combination of Narration and Dialogue pass the "Audio-Only" story test? Is it clearly audible above the music mix (-5dB to -1dB peak)?
  • Character Consistency: Were character reference images chained correctly using the --reference-image flag?
  • Motion Graphics: Are opening titles and closing credits present?
  • The "Simulation Ban": Is the final master a real, playable MP4 file (typically >50MB) and not a placeholder stub?
14

Appendix: Dialogue & Narration Pre-Production Checklist

Before leaving Step 2 (Beat Sheet), the team must verify:

  • Audio Classification: Every shot is tagged as [DIALOGUE], [VO], [COMPOUND], or [SILENT].
  • VO-Safe Rule: No [VO] shot has a motion prompt implying characters are speaking, shouting, or cheering.
  • Compound Shots: The film includes at least 2-3 [COMPOUND] shots where two characters speak sequentially.
  • Timing Hints: Every dialogue and VO entry includes a timing hint (e.g., "0s-3.5s", "after action", "second half").
  • Character Voice: Dialogue lines are specific to the character's worldview and patterns.
  • Audio-Only Test: Reading the script aloud without visuals, can a listener follow the narrative?
  • Visual-Only Test: Do motion prompts for [VO] shots describe visuals that look like "narration territory"?