← Theta Team Page | Artifacts | All Teams

Generatability Review

techlead_generatability_review.md

Tech Lead Generatability Review — 3 Sparks

Recommendation: Spark 3 (The King’s Ransom)

Spark 1: Illusions of Grandeur — MEDIUM

Pros: Strong visual DNA (neon Vegas + arid desert = two clean palettes). Magician costume is a great reference anchor. Smoke/pyro FX are amorphous — forgiving for generation models, no geometric precision needed.

Cons: 3 main characters + cult extras = tight reference budget. Max 3 reference images per Veo shot, so with 3 protagonists we burn all refs on characters and have nothing left for environment anchoring. Chase sequence on a rickety truck = complex fast motion — Veo handles moderate motion well but fast vehicle chases risk major artifacts. The cult requires generating consistent extras across multiple shots, which is a known weak point.

Risk: Reference budget exhaustion + complex vehicle motion.

Pros: Dresses-in-desert = great visual contrast, sun-drenched lighting is clean.

Cons: The 10-tier wedding cake is a dealbreaker. Maintaining a complex geometric object with precise detail across 20+ shots is the hardest thing you can ask of current image/video models. The cake will morph, lose tiers, change proportions in every generation. Add a dune buggy chase (complex terrain motion) + 3 bridesmaids + armed rivals = too many moving parts.

Risk: Object consistency failure (cake) + too many characters + chase motion.

Pros:

  1. Character anchoring: Elvis impersonator is the single best anchor character possible — the costume is globally recognizable, any model knows what Elvis looks like. The grandmother, agent, and thief are all visually distinct archetypes. 4 characters but only 2-3 need to be on screen at once (limo interior), which fits our 3-ref limit perfectly.

  2. Environment simplicity: 80% of the film takes place in ONE master setting (limo interior). That is consistency gold. We define one rich limo interior prompt and reuse it across most shots. The remaining 20% is Vegas exterior (neon, night) and desert highway (dark, sparse). Three environments total.

  3. Motion complexity = LOW. This is a dialogue-driven character piece. People sitting, talking, reacting. Moderate gestures. Veo handles this extremely well. No fast chases, no complex stunts.

  4. Night/interior scenes are forgiving. Darkness and controlled lighting hide the small inconsistencies that generation models produce. Shadow naturally covers what the model cannot resolve.

  5. Strong visual arc: Neon Vegas to dark desert highway to dawn. Clean lighting progression that gives us distinct visual acts without needing many sets.

  6. Audio design: Claustrophobic interior = limited ambient audio needed (engine hum, radio). Dialogue-heavy = our Veo integrated audio prompting excels here. Less complex foley/SFX work.

Bottom line from the synthesis rig: Spark 3 lets us pour all our generation budget into character performance and dialogue rather than fighting environmental complexity, object consistency, or fast motion artifacts. The tools work FOR the story instead of against it.