What Went Well
- Strict Tone Mandate: The “Pure Comedy” and “Zero Noir” mandates were highly effective. Establishing a bright, fluorescent, corporate contrast against medieval grime provided an extremely tight visual guardrail that the tech lead and editor easily adhered to.
- Reference Chaining Strategy: The tech lead’s proposal to use reference-to-reference chaining for the Start/End frames in Step 4 was brilliant. It resulted in high continuity without requiring overly complex prompting.
- Pre-emptive Editorial Guardrails: Working with the editor early in Step 1 to establish “what NOT to do” (no slow pull-ins, no dead air) prevented moody genre-drift before it could even happen.
What Didn’t Go Well
- Pacing Miscalculation in Step 2: I initially miscalculated the raw visual coverage needed to hit the 3:00 minimum runtime. I assumed Voiceover pacing and transitions would easily fill a 21-second gap, which resulted in a hard stop from the Pilot Coach. I had to quickly write and insert new “B-plot” beats (the Excel spreadsheet and the shield parry) to cross the 180-second raw threshold.
Failure Modes & Bottlenecks
- Generatability Constraints on Comedy: Comedy relies heavily on specific framing and timing. The restriction against complex camera movements (tracking, overhead) meant I had to rewrite several dynamic visual gags into static shots. This forced the comedy to rely almost entirely on the absurdity of the subjects within the frame rather than the camera movement itself.
- Content Policy Friction: The language in a comedic medieval battle (e.g., “torment”, pointing a sword) repeatedly brushed against the safety filters during generation, requiring the tech lead to sanitize dialogue (“confusion”) and handle API errors.
Key Decisions Made
- Embracing the “Lion Rampant” Default: When the Veo/Imagen models repeatedly generated a lion crest instead of the scripted badger crest for Sir Reginald, I decided to accept the lion rather than force a prompt fight. The specific animal was irrelevant to the comedy, but maintaining the consistency of the reference chain was paramount.
- Static Framing for Slapstick: I opted to lock off almost all cameras in the motion prompts. By framing the absurd actions (fighting a Keurig, firing rubber bands) in deadpan, static wides, the comedy played much better than if the camera tried to match the energy of the action.
Suggestions for Improvement
- Automated Duration Calculator for Scripting: The Idea Person needs a more precise way to calculate the raw generated shot duration vs. the final edited duration during Step 2. Relying on gut feeling for VO padding causes unnecessary gate rejections.
- “Safe” Comedy Prompt Lexicon: We should develop a shared lexicon of “safe” combat terms for the Tech Leads to use when generating slapstick action to prevent the content filters from flagging comedic violence as real violence.
Post-Script: The Dialogue Reshoots
After our initial delivery, we underwent a selective reshoot to fix a persistent issue: Voiceover playing while characters were visibly mouthing words, and a lack of conversational realism (everything was one-liners).
- What we learned: When scripting for generative video, the Idea Person must proactively specify “frozen tableau” or “completely motionless” for shots where narration will carry the scene, otherwise the models default to idle speech animations. Furthermore, writing “compound dialogue” shots (two characters speaking back-to-forth in one shot) yields significantly better comedy than isolated cuts, but requires precise orchestration of multiple TTS stems against a single visual clip.