Step 1 Final Generatability Sign-Off — “The Midnight Audit”
Reviewer: kappa-techlead (Director of Photography) Date: 2026-05-18 Status: ✅ APPROVED — with notes
Critical Discovery: Genre Strategy Is AI-Native
The treatment makes a brilliant creative choice that I need to call out: the intended visual style (moody, noir, high-contrast, dramatic) is exactly what AI models default to. We don’t need to fight genre drift — we LEAN INTO it. The comedy comes from the juxtaposition between deadly-serious cinematography and absurdly banal content (a stapler discussing synergy). This means every Veo generation’s natural tendency toward dramatic realism actively serves the mockumentary conceit.
Updated Tone Anchors (replacing my earlier notes):
claymation stop-motion, rough tactile clay texture with visible thumbprints, moody noir lighting, harsh fluorescent overhead, deep dramatic shadows, gritty documentary style, macro photography perspective, miniature desk set, extreme close-up, corporate office at night, deadpan serious tone
Character Generatability Assessment
| Character | Difficulty | Notes |
|---|---|---|
| Stanton (Stapler) | ⭐⭐ Easy | Distinctive, simple silhouette. All-black metal with silver chip marks. Heavy, slow movement = Veo-friendly. |
| Clippy (Paperclip) | ⭐⭐⭐ Medium | Simple shape but needs “expressive eyes” (bent wire). Trembling = good motion. Risk: may look too generic without strong reference chain. |
| Highlighter | ⭐⭐ Easy | Chunky yellow with chewed cap. Very distinctive color/shape. Barely moves = trivially simple for Veo. |
| The Boss (Hand) | ⭐⭐ Easy | Realistic human hand against claymation = strong visual contrast. Only appears in 2-3 shots. Shadow-only shots are even simpler. |
Max characters per shot: Treatment naturally stays within the 2-character limit. Interviews are solo. Group scenes (briefing, montage) have 2-3 but can be shot in pairs.
Setting Generatability
Single master setting (Desk 4B) with sub-zones. This is excellent — one strong setting reference image covers 80% of shots. Sub-settings are just different angles/areas of the same desk.
| Sub-Setting | Generatability | Reference Image Needed? |
|---|---|---|
| Desk 4B (wide establishing) | ⭐⭐⭐⭐⭐ | Yes — primary setting ref |
| Plains of Clack (keyboard) | ⭐⭐⭐⭐ | Optional — macro of keyboard surface |
| The Abyss (desk edge) | ⭐⭐⭐⭐ | No — describe in prompt |
| The Monolith (monitor shadow) | ⭐⭐⭐⭐⭐ | No — dark background with blue glow |
| Coffee Ring Craters | ⭐⭐⭐⭐ | No — texture detail in prompt |
Shot-by-Shot Feasibility (Act Breakdown)
Act I: The Gathering Storm
- Establishing “drone” shot over desk:
genmedia-video generatewith slow tracking prompt. ✅ - Stanton interview (extreme close-up, static): Ideal for Veo. ✅
- Clippy interview (shaky camera): Prompt “slight handheld movement, documentary style.” ✅
- Highlighter interview (static, cynical): Easiest shot type. ✅
- B-roll of Clippy struggling with Post-Its:
from-imagewith clear start frame. ✅
Act II: The Discovery
- Stanton patrolling keyboard: Medium motion, single character. ✅
- Discovery of paper stack: Dramatic zoom-in. ✅
- Briefing scene (3 characters): Split into 2 shots — Stanton addressing camera, then cut to Clippy+Highlighter reaction. ✅
- Highlighter’s refusal + exit: Strong dialogue scene, static. ✅
Act III: The Futile Mobilization
- Training montage: Series of short clips. Break into 4-6 individual shots (Clippy attempts task, fails). Each is simple. ✅
- Stanton’s breakdown interview: Static close-up, moody lighting. ✅
Act IV: The Crisis
- HVAC wind sequence: ⚠️ Most challenging. Paper fluttering, dust blowing, dynamic motion. Strategy: generate as
[SILENT]with dramatic music overlay. Usefrom-imagewith start frame showing paper beginning to slide. - Clippy’s hero run: ⚠️ Break into 4+ short shots (3-4s each): (1) Clippy starts running, (2) leap over key gap, (3) slide through condensation, (4) clamping onto papers. Each individually is achievable.
- Slow-motion climax: Veo doesn’t do literal slow-mo, but “slow, deliberate movement” + dramatic music + editing can create the effect. The Editor can add slow-mo in post if needed via ffmpeg.
Act V: The Resolution
- Hand descending: “Massive realistic human hand entering a claymation miniature desk set from above.” ✅
- Hand grabbing Highlighter: Simple motion. ✅
- Final interviews (all 3): Static close-ups. ✅
- Wide pull-back: Establishing shot, slow zoom out. ✅
- Rubber band rolling: Simple B-roll insert. ✅
Risks & Mitigations
| # | Risk | Severity | Mitigation |
|---|---|---|---|
| 1 | Anthropomorphized objects lack “expression” in Veo | Medium | Generate strong reference chains with clear facial features. For interviews, use from-image with carefully crafted start frames showing exact expression. TTS carries the emotional performance. |
| 2 | Act IV action sequence too complex for Veo | Medium | Decompose into 4-6 simple 3-4s shots. No continuous complex action. Let the Editor build the sequence rhythm from short clips. |
| 3 | Claymation texture inconsistent across shots | Low | This is actually a feature. Real claymation has texture variation between frames. Include “rough, handmade, visible thumbprints” in every prompt. |
| 4 | Scale/perspective drift (desk looks normal-sized instead of vast) | Medium | Always include “macro photography, miniature set, extreme close-up perspective, shallow depth of field” in prompts. Setting reference image establishes the scale. |
| 5 | Realistic hand vs claymation style clash | Low | Intentional contrast. Prompt: “realistic human hand entering a claymation stop-motion miniature set.” The uncanny contrast IS the creative choice. |
| 6 | Story is long — may exceed 5 min | Medium | Editor’s domain. 5 acts is dense for 3-5 min. May need to trim Act III montage or tighten Act IV. Flag for beat sheet planning. |
Duration Estimation (Rough)
| Act | Estimated Duration | Key Content |
|---|---|---|
| Act I | 45-60s | 3 interviews + establishing B-roll |
| Act II | 45-60s | Patrol, discovery, briefing, Highlighter refusal |
| Act III | 30-45s | Training montage + Stanton breakdown |
| Act IV | 45-60s | HVAC crisis + Clippy’s hero run |
| Act V | 30-45s | Hand descent, 3 final interviews, pull-back |
| Total | 3:15 - 4:30 | Within target ✅ |
Verdict
APPROVED. This concept is highly generatable. The noir-mockumentary visual style works WITH AI model defaults rather than against them. The single-setting, small-cast structure keeps our reference chain budget manageable. The biggest technical challenges (Act IV action sequence) are solvable by decomposing into short, simple shots.
One adjustment request for Step 2: Duration is tight at the upper end. The Idea Person should be aware that 5 acts may need compression during the beat sheet. Act III (training montage) is the most compressible without losing narrative value.
The DP signs off on Step 1.