Synthesis Rig Assessment — “The Ferret Incident”
Author: rho-techlead | Step 1 Prep | 2026-05-20
Locked Constraints (from Step 0)
- Zero on-screen physical contact between bellhop and ferret
- Zero fluid camera movement — locked tripod, whip-cut transitions only
- Each shot is a static tableau with max 2 subjects
- Wes Anderson aesthetic: meticulous symmetry, pastel palette, flat wide-angle compositions
Optimal API Selection
Image Generation (Steps 3-4: Characters, Settings, Storyboard)
Primary: gemini-3-pro-image-preview (Nano Banana Pro) — default for genmedia-image generate
- Best for: Character reference chains, composite sheets, storyboard frames
- Why: Supports
--reference-imagechaining (critical for character consistency). Supports all aspect ratios including 16:9. - Use for: All character headshots, body sheets, scene tests, composite sheets, setting references, and storyboard start/end frames.
Secondary: imagen-4.0-generate-001 (Imagen 4) — via genmedia-image imagen
- Best for: High-fidelity environment/setting references where reference chaining isn’t needed
- Why: 2K resolution option gives sharper detail for setting reference images (hotel lobby, corridors, desk compositions)
- Use for: Master setting references (empty hotel environments) — these don’t need character reference chaining.
Video Generation (Step 5: Principal Photography)
Primary: veo-3.1-fast-generate-001 (Veo 3.1 Fast) — default model
- Best for: Base shot generation (4-8s clips)
- Why: Supports from-frames (start+end keyframe interpolation), generates audio, fastest turnaround
- Use for: All standard tableau shots. The locked-tripod, static-composition approach means from-image is ideal — animate one keyframe with minimal motion described in the prompt.
From-Frames vs From-Image Decision:
- From-Image (
genmedia-video from-image): Use for shots with minimal visual delta (character holds pose, clock ticks, static composition with subtle ambient motion like curtain sway). This is the MAJORITY of our shots given the static tableau mandate. - From-Frames (
genmedia-video from-frames): Use for shots with defined visual change (e.g., start: tidy desk → end: toppled lamp/scattered papers). The start/end frames lock the transition.
Extend model: veo-3.1-lite-generate-001 (Veo 3.1 Lite)
- Only model supporting extend. Each extend adds exactly 7 seconds.
- Use sparingly — most Anderson-style shots are short (4-6s). Reserve for any establishing shot or montage beat that needs >8s.
Voice Generation (Step 6: Narration & Dialogue)
Model: gemini-3.1-flash-tts-preview via genmedia-voice generate
- Voice selection TBD after character profiles are finalized
- Narrator voice: Need dry, clinical, measured — candidates: Fenrir (authoritative, measured), Enceladus, or Schedar. Will audition in Step 3.
- Bellhop voice: Need uptight, precise — shorter lines
- Inspector voice: Need pompous, clipped — brief appearances only
- 800-char limit per call — split longer narration passages
Music Generation (Step 6: Score)
Primary: lyria-3-pro-preview (Lyria 3 Pro)
- ~2:30 duration — sufficient for full musical arc segments
- Use for: Main score (quirky plucked strings / glockenspiel motif per editor’s musical arc)
Secondary: lyria-3-clip-preview (Lyria 3 Clip)
- ~30s clips — use for specific SFX stingers or bridge cues
Wes Anderson Prompt DNA
Every image and video prompt MUST include these Tone Anchor keywords (pending final Tone Contract from rho-idea, these are my technical recommendations):
Wes Anderson style, meticulous symmetry, centered composition,
pastel color palette, flat wide-angle lens, locked tripod shot,
warm soft lighting, whimsical, deadpan, precise staging
Anti-Drift Keywords (Genre Counterbalance)
Models will try to make this dramatic/moody. Fight it with:
bright even lighting, NO shadows, NO dramatic contrast,
NO cinematic depth of field, flat perspective,
saturated pastels, theatrical staging, comedy
Ferret-Specific Prompts
To combat anthropomorphism bias:
realistic white ferret, no cartoon features, no googly eyes,
natural animal proportions, literal ferret behavior
Reference Budget Strategy (Per Shot Type)
| Shot Type | Ref 1 | Ref 2 | Ref 3 |
|---|---|---|---|
| Bellhop solo | bellhop_sheet.png | hotel_setting.png | (available) |
| Ferret solo | ferret_sheet.png | hotel_setting.png | (available) |
| Aftermath/environment | hotel_setting.png | (available) | (available) |
| Bellhop + Inspector | bellhop_sheet.png | inspector_sheet.png | hotel_setting.png |
| Insert (bell, clock, ledger) | object_ref.png | hotel_setting.png | (available) |
Budget is comfortable for every shot type. Never exceeds 3.
Production Efficiency Notes
- Python orchestration — all batch generation scripts will use
/workspace/tools/genmedia.pywrapper, never raw bash loops (quote-safe prompts with special characters). - Overhang Principle — every shot generated with +4s (2s pre-roll + 2s post-roll) for trim flexibility.
- QA Helper delegation — batch generation execution and verify-dailies checks will be delegated to a qa-helper agent per the Rule of 10.
- Skip-existing — all scripts use
skip_existing=True(default) for safe re-runs after partial failures. - Shot manifest —
shot-manifest.jsongenerated alongside principal photography scripts for verify-dailies gate.