Generatability Review — Story Sparks

Reviewer: kappa-techlead (Director of Photography) Date: 2026-05-18

Evaluation Criteria

Each spark assessed against our synthesis rig capabilities:

Character consistency (reference chaining, max 3 refs per Veo shot, max 2 chars per shot)
Visual style fidelity (can the model sustain this look across 20+ shots?)
Camera/motion complexity (Veo handles slow/static best, struggles with complex action)
Genre drift risk (models default to dramatic/noir — how hard do we fight?)
Safety filter risk (will prompts trigger content filters?)
Audio pipeline fit (TTS dialogue, Lyria score, Veo ambient audio)

Spark 1: “The Midnight Audit” — Claymation Mockumentary

Generatability Score: 9/10 ✅ STRONG RECOMMEND

Factor	Rating	Notes
Character consistency	⭐⭐⭐⭐⭐	Anthropomorphized objects (stapler, paperclip) are inherently more forgiving than human faces. Clay texture masks minor drift.
Visual style	⭐⭐⭐⭐⭐	“Claymation” is a strong style anchor — Gemini/Nano Banana generates this well. Imperfections read as charm, not failure.
Camera/motion	⭐⭐⭐⭐⭐	Mockumentary = talking heads + B-roll. Static or slow push-in shots. Perfect for Veo.
Genre drift	⭐⭐⭐⭐	Moderate risk. “Bright, playful, warm studio lighting, tactile claymation” are strong tone anchors.
Safety	⭐⭐⭐⭐⭐	Zero risk. Office supplies can’t trip any filters.
Audio	⭐⭐⭐⭐⭐	Comedy dialogue via TTS (distinct voices for each supply). Quirky score via Lyria.

Key advantages:

Claymation is the most forgiving visual style for gen AI. Every inconsistency reads as handmade texture.
Mockumentary format is structurally perfect: interview segments (static frame, single character talking) + cutaway B-roll (action shots). This maps directly to our shot types.
Non-human characters sidestep the hardest problem (human facial consistency across shots).

Risks & mitigations:

Can Veo animate a talking stapler convincingly? → Generate strong reference chains with clear “face” region. If Veo struggles, fall back to static claymation with voiceover (VO shots instead of DIALOGUE).
Genre drift to “Toy Story drama” → Lock in tone anchors: “bright, absurd, comedic, Aardman-style claymation, warm overhead lighting, playful.”

Spark 2: “The Time-Traveling Teacup” — Whimsical Romance

Generatability Score: 6/10 ⚠️ BEAUTIFUL BUT RISKY

Factor	Rating	Notes
Character consistency	⭐⭐	Two human protagonists across many shots = hardest consistency problem. Reference chains must be meticulous.
Visual style	⭐⭐⭐⭐⭐	Soft focus, golden hour, ethereal — diffusion models excel here. This is their native territory.
Camera/motion	⭐⭐⭐⭐	Slow, deliberate moves fit Veo well.
Genre drift	⭐⭐⭐⭐	Romance/ethereal is close to model defaults. Less fighting needed.
Safety	⭐⭐⭐⭐	Minor romance content risk, but “silent, longing” is safe.
Audio	⭐⭐⭐	Relies on emotional TTS narration + romantic score. No dialogue between characters (silent romance) = good for VO-safe shots.

Key advantages:

The visual palette (pastels, golden hour, soft focus) is where generative models shine brightest.
“Silent romance” means VO-heavy, which avoids the lip-sync consistency problem entirely.
Emotionally resonant concept — high storytelling ceiling.

Risks & mitigations:

Human face consistency across 20+ shots: This is the single biggest risk. Two full reference chains needed. Budget: 1 character sheet + 1 setting ref = 2 of 3 Veo reference slots used per shot, meaning both characters can never appear together with a setting reference. Max 2 chars per shot helps here.
Two timelines (present vs 1950s): Need distinct visual signatures. Can encode in prompts: “warm golden present-day” vs “desaturated sepia, vintage film grain.”
Emotional subtlety: Veo is mediocre at nuanced facial expressions. “Longing glance” is hard to prompt. The narration must carry the emotion, not the visuals.

Spark 3: “Paranormal Pest Control” — Found Footage Comedy

Generatability Score: 7/10 👍 SOLID MIDDLE GROUND

Factor	Rating	Notes
Character consistency	⭐⭐⭐	Multiple human characters (team + ghost). Found footage grain helps mask drift.
Visual style	⭐⭐⭐⭐⭐	VHS/grainy = intentional degradation. Generation artifacts become features. Can add VHS overlay in post via ffmpeg.
Camera/motion	⭐⭐⭐	“Shaky cam” is harder for Veo to get right consistently. May look jittery rather than handheld.
Genre drift	⭐⭐⭐	“Paranormal” + dark interiors could easily drift to actual horror. Needs strong comedy anchors.
Safety	⭐⭐⭐	“Ghost” content needs careful prompt sanitization. “Polite snob” framing helps, but “ghost exterminators” could trigger filters depending on phrasing.
Audio	⭐⭐⭐⭐	Comedy dialogue via TTS works well. VHS audio artifacts can be added in post.

Key advantages:

VHS degradation is a brilliant mask for generation artifacts.
Comedy through dialogue (ghost critiquing decor) is TTS-friendly.

Risks & mitigations:

Shaky cam → May need to prompt “handheld documentary camera, slight movement” rather than true shaky-cam. Or add camera shake in post with ffmpeg.
Safety filters on “ghost” → Use archetype substitution: “translucent elderly gentleman in period clothing” instead of “ghost.” Frame as “theatrical comedy performance.”
3+ characters → Max 2 per shot constraint means careful shot planning. Most shots: 1-2 exterminators OR ghost + 1 exterminator.

Recommendation

Spark 1 (The Midnight Audit) is the strongest pick from a generatability standpoint. The claymation style is maximally forgiving, the mockumentary format maps perfectly to our shot types, and non-human characters eliminate our hardest consistency problem. The comedy can be carried almost entirely by dialogue (TTS) and absurd situations rather than requiring precise visual acting.

However, if the team wants the higher artistic ceiling, Spark 2 is achievable — we’d just need to be very disciplined about reference chains and accept that the narration carries the emotion. The silent romance concept is actually audio-pipeline-friendly.

I’m ready to execute either. The DP’s recommendation: Spark 1 for reliability, Spark 2 for ambition.