What Went Well
- Parallel batch generation was highly effective. Running 4 scene batches simultaneously (Scenes 2-5) reduced total generation time from ~45 minutes sequential to ~15 minutes wall-clock. The async self-message callback pattern worked perfectly for receiving completion notifications.
- The from-frames pipeline delivered consistent results. Using GCS-hosted storyboard start/end frames as first-last frame inputs to Veo 3.1 produced visually coherent clips that tracked the storyboard well. 37/37 shots passed verification on first attempt (no reshoots requested by editor or idea person).
- 720p Gold Standard met across the board. Every generated clip was exactly 1280x720, 8 seconds, h264+aac. The genmedia-verify tool gave instant confirmation.
- MoGraph agent delegation worked smoothly. Providing a clear brief (aspect ratio, text content, style prompt) let the MoGraph agent deliver a polished 7-second title card autonomously without back-and-forth.
- Credits music generation was fast. Lyria 3 Clip produced a suitable 27.5s warm glockenspiel piece in under 10 seconds.
What Didn’t Go Well
- File race condition with parallel output directory. When 4 concurrent
genmedia-video from-framesbatches wrote to the same output directory, the timestamp-based auto-filenames causedmvfailures for 3 shots (09, 11, 28). The generated file was downloaded but the rename to the canonicalshot_XX.mp4name failed because a concurrent process had altered the directory state. Required manual regen of those 3 shots. - Google content safety filter false positives. Two prompts were rejected: shot 08’s original prompt (containing “HIC-CUP shudder-clank”) and shot 21’s prompt (containing “glowing ivory feather floating past a young girl’s face”). The retry with simplified prompts succeeded, but this cost ~2 minutes per affected shot.
- USAGE.md flag name discrepancy. The documentation shows
--first-frame-uri/--last-frame-uribut the actual tool uses--first-image-uri/--last-image-uri. This caused the entire first batch attempt (30 shots) to fail silently with empty stdout, wasting several minutes.
Failure Modes & Bottlenecks
- Silent failures in batch scripts. When
genmedia-videoreceived an invalid flag, it printed usage to stderr (which was redirected to /dev/null) and returned empty stdout. The JSON parsing then failed with a generic “parse error” that didn’t indicate the root cause. Adding stderr capture or a pre-flight flag validation would catch this immediately. - The
scion messageset[] syntax failed in zsh. Theset[eta-idea,eta-editor]format for group messaging didn’t work (zsh interpreted brackets as glob patterns). Fell back to individual messages. Quoting the argument would likely fix this.
Key Decisions Made
- Used from-frames instead of from-image. The storyboard had both start and end frames for every shot, making from-frames (first-last frame interpolation) the natural choice over from-image (single frame animation). This gave Veo better context for the motion trajectory.
- Fixed 8-second duration for all shots. Rather than varying duration per the shot breakdown (4-7s planned), used uniform 8s for all shots. The Overhang Principle requires extra pre/post-roll anyway, and the editor can trim to exact timing in the timeline. More footage is always better than less.
- Added automatic retry with simplified prompt. Built a retry mechanism into the batch script that falls back to a minimal “Claymation stop-motion animation, cinematic, smooth motion” prompt when the detailed prompt triggers content filters. This ensured shots weren’t blocked by over-specific language.
- Ran missing shot regens individually, not in parallel. After identifying the 3 race-condition casualties, regenerated them one-at-a-time to avoid the same directory contention.
Suggestions for Improvement
- Use per-shot output directories in parallel batches. Each batch should write to a temporary subdirectory (e.g.,
shots/batch_s2/) and then move files to the canonical location after generation. This eliminates the filename race condition entirely. - Fix USAGE.md flag names. The
from-framesdocumentation should use--first-image-uri/--last-image-urito match the actual CLI tool. - Add
--output-filesupport to genmedia-video. Being able to specify the output filename (like genmedia-music supports) would eliminate the rename step entirely. - Content safety prompt cookbook. Create a shared list of words/phrases known to trigger Google’s safety filters so tech leads can avoid them proactively. “HIC-CUP” and descriptions of items near children’s faces seem to be trigger patterns.
- Earlier MoGraph agent start. The title card could have been started during Step 5 (principal photography) rather than waiting for Step 7. It’s independent of the video shots and the ~4 minute agent startup time adds unnecessary latency to the final assembly pipeline.