← Mu Artifacts | Mu Team

Tech Feasibility: Step 1

Mu Team — "The Paper Frontier"

Mu Team — Step 1 Technical Feasibility Assessment

Author: mu-techlead (Director of Photography) Date: 2026-05-18

Concept Summary

Two photo-realistic matchbox toy cars (Zip and Rusty) traverse a hand-drawn pencil-sketch world. Happy adventure genre. Cars “speak” by opening/closing their hoods.


Generatability Assessment

1. Mixed-Media Visual Style — FEASIBLE (Medium Risk)

The Ask: Photo-realistic die-cast toy cars composited onto hand-drawn pencil sketch backgrounds in a single frame.

Assessment: This mixed-media look is achievable with careful prompting. Generative image models (especially Gemini/Nano Banana Pro) handle style fusion well when given explicit instructions. The key is prompt specificity:

Optimal API: gemini-3-pro-image-preview (Nano Banana Pro) for all image generation. It supports --reference-image chaining (critical for character consistency) and handles style-mixing better than Imagen for this use case.

2. Hood-Open Lip-Sync — HIGH RISK (Needs Early Testing)

The Ask: Cars speak by mechanically opening/closing their hoods in sync with dialogue. Use Veo native audio to auto-sync this.

Assessment: This is the single riskiest element in the production.

Recommendation: We MUST run a quick proof-of-concept test in Step 3 — generate a single 4s clip of a toy car with its hood opening/closing while “speaking” to validate the mechanism before committing to the full pipeline.

3. Character Consistency — FEASIBLE (Standard Pipeline)

The Ask: Two distinct toy cars (yellow vintage sports car, blue battered pickup truck) that look the same across all shots.

Assessment: This is well-suited to the reference-chain workflow:

Optimal API: gemini-3-pro-image-preview with --reference-image chaining.

4. Anthropomorphism Drift — MEDIUM RISK

The Ask: Cars should look like real die-cast toys. No faces, no eyes, no limbs.

Assessment: Models have a strong bias toward anthropomorphizing objects, especially when they’re described as “speaking” or having personalities. We’ll need:

5. Genre Drift Prevention — STANDARD

The Ask: Joyful, optimistic, warm, playful.

Assessment: The design brief tone anchors are solid. Every prompt must include: “Joyful, bright, warm sunlight, optimistic, childlike wonder, colorful pencil sketch background, tangible macro photography of toy cars, playful, bright.” Without these, the model will default to moody/dramatic/noir.


Optimal API Map

Asset TypeToolModelNotes
Character refs & storyboard framesgenmedia-image generateNano Banana Pro (default)Reference chaining for consistency
High-fidelity object refsgenmedia-image imagenImagen 4Use for isolated car renders if Nano Banana struggles with photo-realism
Principal photography (video)genmedia-video from-image / generateVeo 3.1 (veo-3.1-generate-001)Audio enabled for dialogue shots; --generate-audio=false for VO shots
Video extensiongenmedia-video extendVeo 3.1 Lite (veo-3.1-lite-generate-001)Only model supporting extend
Frame interpolationgenmedia-video from-framesVeo 3.1Start/end frame storyboard animation
Narrator voiceovergenmedia-voice generateGemini 3.1 Flash TTSVoice TBD — suggest Fenrir or Orus for warm narrator
Car dialogue (if TTS fallback)genmedia-voice generateGemini 3.1 Flash TTSZip: a bright/quick voice; Rusty: a slower/gruffer voice
Score/musicgenmedia-music generateLyria 3 Pro2:30 tracks for score arcs; Lyria 3 Clip for 30s stingers
Assemblygenmedia-assembleN/A (FFmpeg)Timeline-based assembly for precise control

Key Technical Constraints to Feed Back to Idea Person

  1. Max 2 characters per shot (playbook mandate) — works perfectly for Zip + Rusty buddy dynamic.
  2. Max 3 reference images per Veo shot — budget: 1 car character sheet + 1 setting ref + 1 second car OR object ref.
  3. Video durations: Base clips 4/6/8s. Extensions add exactly 7s each. Plan shot durations around these increments + 4s overhang.
  4. TTS text limit: 800 characters per call. Long narration lines must be split.
  5. Hood lip-sync is unproven — the story should not depend on perfect phoneme-level sync. Write dialogue that works even with approximate mechanical movement.
  6. 16:9 aspect ratio throughout. Final output 1280x720.

Verdict

The concept is generatable. The mixed-media style is well-suited to prompt engineering, and the two-character structure keeps complexity manageable. The main risk is the hood lip-sync mechanism — we’ll validate it early in Step 3 and have a clear fallback (TTS overlay). The happy adventure tone requires active prompt engineering to prevent genre drift, but the tone anchors in the design brief are strong.

Ready to proceed to Step 2 once the Idea Person delivers high_concept.md.