← Xi Artifacts | Xi Team

API Recommendations

Xi Team — "Meltdown"

Xi Team — API & Model Recommendations for “Meltdown”

Author: xi-techlead (DP)
Date: 2026-05-19

Optimal model selections for the bright claymation aesthetic.


Image Generation (Reference Chains & Storyboard)

Use CaseModelRationale
Character referencesgemini-3-pro-image-preview (Nano Banana Pro)Supports --reference-image chaining — essential for building the 4-image reference chain per character. Best quality for anchoring the claymation look.
Setting referencesgemini-3-pro-image-previewSame reasoning — environment anchoring via reference images.
Storyboard framesgemini-3-pro-image-previewStart/End frames need reference injection for character + setting consistency.
Quick iterationsgemini-3.1-flash-image-preview (Nano Banana 2)Faster, for exploratory prompts or retries.

Aspect ratio: 16:9 for all frames. Higher-than-720p intermediate resolution encouraged.


Video Generation (Principal Photography)

Use CaseModelRationale
Primary shootingveo-3.1-fast-generate-001 (Veo 3.1 Fast)Default. 4/6/8s durations. Supports from-frames (start + end frame interpolation) — critical for storyboard-to-video pipeline. Generates audio alongside visuals.
Hero shotsveo-3.1-generate-001 (Veo 3.1)Higher quality for key moments. Same capabilities as Fast.
Extend operationsveo-3.1-lite-generate-001 (Veo 3.1 Lite)Only model that supports extend. Each extend adds exactly 7 seconds.

Audio prompting: Include ambient/SFX descriptions (squishing clay sounds, alarm ringing, refrigerator hum) directly in the Veo prompt. Generate with audio enabled for [VO] and [SILENT] shots; for [DIALOGUE] shots, consider --generate-audio=false if we’ll overlay TTS.


Music (Score)

Use CaseModelRationale
Score segments (~30s)lyria-3-clip-preview (Lyria 3 Clip)Quick ~30s clips for per-scene scoring.
Extended score (~2:30)lyria-3-pro-preview (Lyria 3 Pro)Longer arcs spanning multiple scenes. Preferred for the main score to avoid stitching artifacts.

Genre direction for prompts: Whimsical, bright, playful orchestral — xylophones, plucked strings, light percussion. Think Aardman/Wallace & Gromit score. Avoid anything dark, brooding, or electronic.


Voice/TTS (Narration)

Use CaseModelRationale
Narrator VOgemini-3.1-flash-tts-previewOnly TTS option. 800-char limit per call — split longer narration into segments.

Voice selection: Recommend testing Fenrir (warm, measured) or Zephyr (lighter, more playful) for the narrator. The tone should be warm and amused — a storyteller enjoying a silly tale, not a serious documentarian.

TTS timing buffer: 20% extra on all video durations for dialogue/narration shots per playbook mandate.


Key Technical Notes for This Concept

  1. Claymation is our ally. Model artifacts (wobbly textures, slight inconsistencies) read as genre-authentic fingerprints, not errors. Lean into this.
  2. The melting effect is a progressive transformation. Best approach: generate Start Frame (intact marshmallow) and End Frame (partially melted) per shot, then use from-frames to interpolate the melt. The video model should handle continuous deformation well.
  3. Bright, warm, high-key lighting must be encoded in every prompt. Include tone anchors: “tactile, claymation, whimsical, hand-made, bright” per the Tone Contract.
  4. No lip-sync needed. The design brief explicitly bans spoken dialogue. All vocal track is narrator VO — characters are physically expressive but mouth-closed. This simplifies video generation significantly.
  5. Reference budget per shot: 1 character sheet (marshmallow man) + 1 setting reference (kitchen) = 2 refs, leaving 1 slot for a second character (alarm clock) or object reference (fridge).