Step 6 Audio Generation Spec — “The Ferret Incident”
Author: rho-editor | Date: 2026-05-20
Overview
This document defines all audio assets needed for Step 6 (Soundstage). Three categories: Score (6 movements), SFX (9 sound events), and Ambience (1 continuous bed).
SCORE — 6 Movements
All score should be orchestral/chamber music in a whimsical, precise style. Think Desplat scoring Anderson. No electronic elements.
| # | Movement | Timeline | Duration | Tempo | Instruments | Notes |
|---|---|---|---|---|---|---|
| 1 | Clockwork | 0s-47s (Sc 0+1) | ~47s | 80 BPM | Glockenspiel, celesta, pizzicato strings | Precise, metronomic. Opening theme. Must feel mechanical and controlled. |
| 2 | Discovery | 46s-67s (Sc 2, 2.1-2.5) | ~21s | 80 BPM | Strips down: solo glockenspiel fading out | Near-silence. Tension building. Score nearly disappears. |
| 3 | Escalation | 67s-140s (Sc 2, 2.6-2.21) | ~73s | 100 BPM | Plucked cello, muted trumpet, pizzicato | Quick staccato phrases. Builds through the chase/cleanup. Gets more frantic. |
| 4 | Chandelier | 140s-178s (Sc 3) | ~38s | 70->STOP | Full strings build, then SILENCE | Score builds to crescendo, then CUTS DEAD at 3.4-3.5 (crystal drop implied). 2s dead silence before next movement. |
| 5 | Inspector | 178s-233s (Sc 4) | ~55s | 60 BPM | Sparse icy strings, isolated notes | Cold, clinical. Let pen scratch SFX carry rhythm. Score barely present. |
| 6 | Return | 233s-249s (Sc 5) | ~16s | 80 BPM | Opening motif returns (glockenspiel + celesta) | Nostalgic callback to Movement I. Resolves with final bell ding. Fade with picture. |
Score Mixing
- Track volume: -2 dB
- Duck under voice: -12 dB (both VO and DLG tracks)
- Movements should have clean starts/ends with 0.5s fades at boundaries
SFX — 9 Sound Events
| # | Sound | Shot(s) | Timeline Position | Duration | Priority | Notes |
|---|---|---|---|---|---|---|
| 1 | Bell ding x3 | 1.3 | 20s-29s (evenly spaced at ~20s, 23s, 26s) | ~0.5s each | HIGH | Silver desk bell. Clear, bright ding. Same bell sound for all. |
| 2 | Fern crash | 2.3->2.4 | ~57s-58s (end of 2.3 / start of 2.4) | ~1.5s | MED | Off-screen: ceramic pot tipping, dirt spill, fern thud. Muffled. |
| 3 | Clock tick (ambient) | 1.1, 2.5, 2.18, 3.7 | Multiple ECU inserts | Fills each shot | MED | Steady, mechanical tick-tock. Only on clock ECU shots. Not continuous. |
| 4 | Suitcase slam | 2.12 | ~97s (2s into shot) | ~1s | MED | Leather suitcase being slammed shut. Decisive. |
| 5 | Bell slap x1 | 2.15 | ~111s (~1.5s into shot) | ~0.5s | HIGH | Ferret hitting bell. Slightly different from proper ding — more of a chaotic slap. |
| 6 | Crystal drop + shatter | 3.4->3.5 | ~162s-164s (end of 3.4, start of 3.5) | ~2s | HIGH | Crystal pendant falling + shattering on marble. The moment score goes silent. |
| 7 | Pen scratch | 4.4 | 205s-209s | ~4s | MED | Pen on paper/clipboard. Steady scratching throughout Vance writing. |
| 8 | Door open/close | 4.1, 4.8 | 178s (open), 233s (close) | ~1.5s each | LOW | Heavy brass hotel doors. Weighty, slow creak + click. |
| 9 | Final bell ding x1 | 5.2 | ~245s (6s into shot) | ~0.5s | CRITICAL | THE payoff. Same bell as #1 but single, definitive ding. Last sound before fade to black. |
SFX Mixing
- Track volume: -3 dB
- No ducking — SFX plays at consistent level
- Bell sounds must be consistent across all uses (same sample)
AMBIENCE — 1 Continuous Bed
| Sound | Scenes | Duration | Notes |
|---|---|---|---|
| Hotel lobby ambience | All (0-5) | Full film ~249s | Very subtle: distant air conditioning hum, faint marble echo. VERY low in mix. |
Ambience Mixing
- Track volume: -18 dB
- No ducking — always present but barely perceptible
- Provides a “live room” feel to prevent dead silence between events
Generation Priority
- Bell ding (used 5 times across film — the iconic sound)
- Crystal shatter (dramatic pivot point)
- Score Movement I: Clockwork (sets the tone)
- Score Movement VI: Return (emotional payoff)
- All remaining score movements
- Remaining SFX
- Ambience (can be generic hotel lobby)
Timeline Track Additions Needed
When assets arrive, add to timeline.json:
- SCORE track: Already has placeholder (track index 3, role: “music”, vol: -2dB, duck: -12dB)
- SFX track: Already has placeholder (track index 4, role: “sfx”, vol: -3dB)
- AMB track: NEW — needs to be added (role: “sfx”, vol: -18dB)