Team Fluorite — Step 1: Technical Feasibility & API Strategy
Director of Photography: fluorite-techlead
Film: “Pickling Season”
Date: 2026-05-22
Status: DRAFT — Step 1 deliverable
1. API Selection by Shot Type
This maps each category of visual content to the optimal generation tool and model, with rationale.
1.1 Character Reference Chain (Step 3)
| Asset | Tool | Model | Notes |
|---|---|---|---|
| Headshot | genmedia-image generate | Nano Banana Pro (default) | 16:9. Strong photorealistic faces. Start of chain. |
| Body sheet (3-view) | genmedia-image generate | Nano Banana Pro | --reference-image headshot.png. Critical chain link. |
| Scene test 1 | genmedia-image generate | Nano Banana Pro | --reference-image headshot.png --reference-image body_sheet.png. Place character in World A setting. |
| Scene test 2 | genmedia-image generate | Nano Banana Pro | Same refs. Place character in World B setting. |
| Composite character sheet | genmedia-image generate | Nano Banana Pro | --reference-image headshot.png --reference-image body_sheet.png. 4-view grid on white. 16:9. |
| Grandmother’s hands reference | genmedia-image generate | Nano Banana Pro | EXTRA reference — dedicated “hands at work” image per Tone Contract mandate. Weathered, strong, precise. Used in all food-prep shots. |
Why Nano Banana Pro over Imagen for characters: Gemini’s generate subcommand supports --reference-image chaining, which is the backbone of our consistency strategy. Imagen models don’t support reference images — they’re prompt-only. We use Gemini for anything requiring visual continuity.
Why NOT Imagen for characters: Imagen 4 produces excellent standalone images but lacks reference-image input. Using it would break the chain.
1.2 Setting Reference Images (Step 3)
| Setting | Tool | Model | Prompt Keywords |
|---|---|---|---|
| World A: Grandmother’s kitchen (memory) | genmedia-image generate | Nano Banana Pro | ”warm amber light, golden hour, oil painting texture, cluttered kitchen, wooden surfaces, jars of preserved food, herbs hanging, copper pots, stove glow, tactile, lived-in, sensory, 16:9” |
| World B: Granddaughter’s apartment | genmedia-image generate | Nano Banana Pro | ”cool blue-grey light, minimalist modern apartment kitchen, clean white surfaces, sparse, chrome appliances, pale wood, large windows, overcast daylight, sterile, precise, 16:9” |
| World A→B Invasion (Act II, 70/30) | genmedia-image generate | Nano Banana Pro | ”modern apartment kitchen, mostly cool blue-grey light, BUT one corner has warm amber glow, a jar of pickles on the counter, a sprig of dill, subtle warmth leaking in, minimalist with first signs of invasion” |
| World A→B Invasion (Act III, 30/70) | genmedia-image generate | Nano Banana Pro | ”modern apartment kitchen transformed, warm amber light fills most of the frame, jars on every surface, cutting board out, wooden spoon, herbs, the kitchen has become a farmhouse, rich saturated color overtaking the sterile blue-grey” |
4 setting references — one per color-world state. This gives us a visual anchor for the progressive invasion arc.
1.3 Food Close-Ups (Steps 4-5)
| Shot Type | Tool | Model | Notes |
|---|---|---|---|
| Food stills (storyboard frames) | genmedia-image generate OR genmedia-image imagen | Nano Banana Pro or Imagen 4 | Food photography is a proven strength for both. Imagen 4 at 2K resolution gives us textural detail. No character references needed — these are “free” shots. |
| Food video (pickles, brine, chopping) | genmedia-video from-image | Veo 3.1 (default) | Upload food still to GCS → animate with slow motion prompt. “Brine catching light, steam rising, gentle camera push-in.” |
| Extreme macro (cucumber cross-section, jar lid patina) | genmedia-image imagen | Imagen 4 Ultra | Single image at 2K. Maximum detail for signature transition frames. |
Imagen for food stills is acceptable because food close-ups don’t need character reference chaining — they stand alone. We can use Imagen 4’s superior resolution here without breaking continuity.
1.4 Character Scenes (Steps 4-5)
| Shot Type | Tool | Strategy | Model |
|---|---|---|---|
| Single character, static (most shots) | genmedia-video from-frames | Start + End frame interpolation. Character sheet + setting reference as context for frame generation. | Veo 3.1 |
| Single character, slow motion (cooking) | genmedia-video from-image | Single hero frame → animate with motion prompt. “Hands slicing cucumber, deliberate, slow.” | Veo 3.1 |
| Two characters, shared frame (Act III) | genmedia-video from-frames | Both character sheets as references for frames. Static composition — side by side at counter. Max 2 character sheets + 1 setting = 3 refs (at budget). | Veo 3.1 |
| Dialogue shots | genmedia-video from-image or from-frames | Motion prompt must include “speaking, expressive” per Audio Agreement. Use Veo 3.1 for audio generation. | Veo 3.1 |
| VO-safe shots (narrator over visuals) | genmedia-video from-frames | Motion prompt: “mouth closed, focused on task, hands working.” Explicit “No dialogue. No speech. No talking.” in prompt. | Veo 3.1 |
1.5 Audio Pipeline
| Asset | Tool | Model | Notes |
|---|---|---|---|
| Narrator VO (granddaughter) | genmedia-voice generate | Gemini 3.1 Flash TTS | Young woman’s voice. Warm, wry, affectionate. Voice TBD from audition. 800-char limit per call — script will need splitting. |
| Grandmother’s dialogue | genmedia-voice generate | Gemini 3.1 Flash TTS | Older woman’s voice. Imperious, measured, declarative. Different voice profile from narrator. |
| Folk melody (memory world) | genmedia-music generate | Lyria 3 Pro | ”Eastern European folk melody, accordion and clarinet, warm, nostalgic but joyful, moderately slow tempo.” ~2:30 duration for full coverage. |
| Sparse ambient (modern world) | genmedia-music generate | Lyria 3 Clip | ”Minimal ambient, sparse, clean, modern, barely there, cool, no melody.” ~30s, loop-friendly. |
| Ambient/SFX (kitchen sounds) | Baked into Veo video prompts | Veo 3.1 | Audio prompts: “sound of knife on cutting board, boiling water, glass jar lid opening, cucumber crunch.” Negative: “No music. No soundtrack.” |
2. Shot-Type Complexity Assessment
Low Complexity (Expected: ~60% of shots)
- Food close-ups: No character reference needed. Pure prompt magic. Proven AI strength.
- Static single-character compositions: Seated grandmother, standing granddaughter. Ideal for from-frames.
- Empty setting establishing shots: Kitchen interior, apartment exterior. Setting reference only.
- Object inserts: Jar on windowsill, suitcase, wrapped jar in cloth. Simple, beautiful.
Medium Complexity (Expected: ~30% of shots)
- Character cooking sequences: Hand-object interaction (slicing, stirring). Requires hands reference + food anchors. Slow, deliberate motion keeps interpolation stable.
- Color invasion transitions: Progressive blend of World A into World B. Requires careful prompt gradient. Setting reference shifts per act.
- Character dialogue shots: Need lip movement from Veo. Must match TTS duration with 20% buffer.
High Complexity (Expected: ~10% of shots)
- Two-character shared frame: Grandmother and granddaughter cooking together (Act III). Uses full 3-ref budget (2 character sheets + 1 setting). Static side-by-side composition preferred — avoid physical contact.
- Grandmother’s hands close-up during cooking: Most important visual per Tone Contract. Requires dedicated hands reference image + food context. Zero tolerance for “smooth young hands” — must look weathered.
3. Safety Filter Risk Assessment
No Risk
- Food in all forms (vegetables, spices, jars, brine, cooking)
- Kitchen tools in domestic context (knife, cutting board, pots)
- Interior settings (kitchen, apartment)
- Clothing, fabric, aprons
Low Risk (Mitigations Defined)
| Trigger | Mitigation |
|---|---|
| ”Elderly woman” in close-up | Frame as “character portrait, warm kitchen lighting” — lead with setting, not age descriptors. |
| ”Border crossing” / “smuggling” | NEVER render a literal border. Use “carrying,” “bringing.” Imply through narration + objects (suitcase, train window, jar wrapped in cloth). |
| ”Knife” in close-up | Always pair with “domestic cooking context, kitchen, food preparation.” Never isolated blade. |
Zero Risk (Our Strength)
The “Pickling Season” concept is structurally safe. No violence, no weapons outside cooking context, no politically sensitive content, no real-person references, no supernatural elements. This is a quiet domestic story about food and family. We should encounter essentially zero safety filter walls.
4. Prompt Engineering Strategy
4.1 Tone Anchor Injection (Mandatory)
Every single prompt — image and video — gets these 5 keywords appended:
tactile, warm-hearted, deliberate, sensory, lived-in
Plus visual translation of “warm-hearted”:
gentle golden warmth, soft natural light
4.2 World A Prompt Template
[SCENE DESCRIPTION]. Warm amber light, golden hour, oil painting texture,
rich saturated color, abundance, hand-crafted. [CHARACTER REFERENCE:
grandmother character sheet]. Tactile, warm-hearted, deliberate, sensory,
lived-in. Gentle golden warmth, soft natural light. Photorealistic, 16:9
composition.
4.3 World B Prompt Template
[SCENE DESCRIPTION]. Cool blue-grey light, minimalist interior, clean
surfaces, sparse, modern apartment, muted tones, sterile, precise.
[CHARACTER REFERENCE: granddaughter character sheet]. Tactile, warm-hearted,
deliberate, sensory, lived-in. Gentle golden warmth, soft natural light.
Photorealistic, 16:9 composition.
Note: Even World B gets “warm-hearted” and “gentle golden warmth” — these are emotional anchors that prevent the model from drifting into grim/noir territory. The cool palette keywords will dominate the visual, but the warmth anchors keep it from becoming oppressive.
4.4 Color Invasion Prompt Gradient
| Act | World B Keywords | World A Keywords | Ratio |
|---|---|---|---|
| Act I | Full set | None | 100/0 |
| Act II (early) | Full set | ”a single jar of pickles catches warm amber light in the corner” | 90/10 |
| Act II (mid) | Full set | ”warm amber glow from one side, a sprig of dill, jar of pickles on counter” | 70/30 |
| Act III (early) | “modern apartment, large windows, pale wood" | "warm amber light fills most of frame, jars on surfaces, cutting board, herbs” | 30/70 |
| Act III (climax) | Residual only (“apartment kitchen”) | Full set | 0/100 |
4.5 Anti-Melancholy Drift Keywords
Per Tone Contract, AI models will try to make this sad. Counterbalance keywords for grandmother shots:
INCLUDE: confident posture, serene expression, purposeful movement,
warm rich colors, abundance, busy hands
AVOID: lonely, isolated, frail, dim lighting, empty room,
desaturated, wistful, shadowy
4.6 Negative Audio Prompting (Mandatory for All Video)
Every Veo prompt gets:
No music. No soundtrack.
Non-dialogue shots additionally get:
No dialogue. No speech. No talking.
4.7 Hands Reference Strategy
The grandmother’s hands are the most important visual in the film. Strategy:
- Dedicated hands reference image in Step 3: “Close-up of elderly woman’s hands on a wooden cutting board, weathered skin, strong fingers, precise grip, warm amber light, tactile, photorealistic.”
- Chain from character headshot to anchor skin tone and age.
- Include hands reference in every food-prep shot prompt alongside the character sheet.
- Negative prompting if drift occurs: “NOT smooth skin, NOT young hands, NOT manicured nails.”
5. Duration & Extend Strategy
Target Runtime: 3:00–5:00 (180–300 seconds)
Based on the Kaurismäki-inspired deadpan pacing:
- Average shot duration: 5-7 seconds (deliberate, held compositions)
- Comedy beats: 6-8 seconds (timing IS the joke — held shots)
- Food insert transitions: 3 seconds (hard cut, held, hard cut)
- Estimated shot count: 25-35 shots
Veo Duration Strategy
| Planned Duration | Veo Base | Extends | Total Raw | Trim Headroom |
|---|---|---|---|---|
| 3-4s | 8s (with 4s overhang) | 0 | 8s | 4-5s to trim |
| 5-7s | 8s (with 4s overhang) | 0 | 8s | 1-3s to trim |
| 8-10s | 8s + 1 extend | 1 | 15s | 5-7s to trim |
| 11-14s | 8s + 1 extend | 1 | 15s | 1-4s to trim |
Rule: Generate 8s base clips for everything. Only extend shots planned >8s. The overhang principle (2s pre-roll + 2s post-roll) means a “planned 6s shot” needs to be generated as a clip containing at least 10s of usable content — which 8s base provides.
Wait — correction. The overhang is applied to the generation, not the planned duration. A planned 6s shot → generate as 6+4=10s. But Veo base is max 8s. So:
- Planned ≤4s: 8s base is sufficient (8s raw ≥ 4+4=8s needed).
- Planned 5-6s: 8s base works (8s raw, trim 2s pre + 2s post = 4s usable… that’s not enough).
Actually, re-reading the mandate: “Apply a flat 4-second overhang to every shot: 2 seconds of pre-roll and 2 seconds of post-roll added to the planned duration. A planned 10s shot must be generated as a 14s clip.”
So:
- Planned 4s → need 8s raw → 8s base works ✓
- Planned 5s → need 9s raw → 8s base is 1s short → extend once → 15s raw, trim to 9s ✓
- Planned 6s → need 10s raw → extend once → 15s raw, trim to 10s ✓
- Planned 8s → need 12s raw → extend once → 15s raw, trim to 12s ✓
Revised strategy:
- Planned ≤4s: 8s base, no extend.
- Planned 5-8s: 8s base + 1 extend = 15s raw.
- Planned 9-11s: 8s base + 1 extend = 15s raw (tight at 11s).
- Planned 12+s: 8s base + 2 extends = 22s raw. Should be rare.
Most of our shots should be 4-7s (deadpan style), so most will need 0-1 extends.
6. Reference Budget Planning
Veo allows max 3 reference images per shot. Our budget allocation:
| Shot Type | Ref 1 | Ref 2 | Ref 3 |
|---|---|---|---|
| Single character in setting | Character sheet | Setting reference | Hands ref (if cooking) |
| Two characters in setting | Grandmother sheet | Granddaughter sheet | Setting reference |
| Food close-up (no character) | Setting reference | — | — |
| Food with hands | Hands reference | Setting reference | — |
| Empty establishing shot | Setting reference | — | — |
The composite character sheet (4-view grid) is our primary per-character reference. One sheet per character = efficient budget use.
7. Production Risk Register
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Grandmother’s hands inconsistency | MEDIUM | HIGH | Dedicated hands reference image. Include in every food-prep prompt. |
| Melancholy drift (sad instead of funny) | HIGH | HIGH | Aggressive anti-drift keywords. Confident posture descriptors. Rich saturated color. |
| Color invasion gradient looks artificial | LOW | MEDIUM | Test prompt blends in Step 3 scene tests. Gradual keyword ratios. |
| TTS voice mismatch (narrator too old/young) | MEDIUM | MEDIUM | Audition 3-4 voices in Step 3 before committing. |
| Food textures too “AI-perfect” | LOW | LOW | Prompt for “slight imperfection, a pickle with a spot, slightly uneven slice” per Tone Contract. |
| Two-character shots break consistency | MEDIUM | MEDIUM | Limit to 2-3 shared-frame shots. Use side-by-side compositions. No physical contact. |
Ready for fluorite-idea’s high_concept.md and design_brief.md to finalize the visual DNA. Will contribute prompt templates and technical sections to design_brief.md once it exists.