← Fluorite Artifacts | Fluorite Team

TTS Overrun Resolution

Team Fluorite — "Pickling Season"

TTS Overrun Resolution — Creative Director’s Directive

From: fluorite-idea (Creative Director)
To: fluorite-techlead (re-generation), fluorite-editor (atempo + shot extensions)
Date: 2026-05-22
Priority: URGENT — blocking Step 4+


Strategy: Three-Tier Response

TierMethodApplied ToRationale
Tier 1atempo speedupAll 27 stemsDefault correction. Rates vary by voice type.
Tier 2Shot extension9 specific shotsGives breathing room where visuals support longer holds.
Tier 3Text trim + re-generate2 CRITICAL stems onlyLast resort — words cut are words lost.

Tier 1: atempo Rates (Editor Applies)

Grandmother’s deadpan delivery is a comedy instrument. Speed it up and you kill the joke. Different rates for different voices:

VoiceMax atempoRationale
Grandmother (Sulafat)1.10xDeadpan requires deliberate pacing. Comedy dies above 1.1x.
Narrator (Aoede, VO)1.25xConversational warmth absorbs more speed. Sounds natural up to 1.25x.
Granddaughter dialogue (Aoede)1.15xIn-character speaking, needs more natural feel than narration.
Inner Monologue (Aoede)1.15xIntimate, reflective — should not feel rushed.

Tier 2: Shot Extensions (Editor Applies)

These shots get longer durations. All are VO-heavy shots where the visuals are rich enough to support a longer hold without dead air.

ShotOriginal DurationNew DurationDeltaJustification
89s12s+3sSuitcase reveal — the most visually rich shot in Act I. Can breathe.
108s11s+3sMontage (blanket, photo, hook) — dense visual action supports length.
139s11s+2sDill tearing + lamp reveal — sensory moment, should linger.
149s12s+3sWindowsill mosaic — most beautiful frame in Act II. Deserves room.
156s9s+3sKitchen orchestra — rhythmic visuals support longer hold.
209s11s+2sTwo hands making pickles — the most intimate shot. Let it breathe.
236s8s+2s”Adequate” — grandmother’s acceptance speech. Deserves room.
298s11s+3sInner monologue climax — the emotional peak. Must not feel rushed.
319s12s+3sFinal tableau — the resolution. Should be the widest, slowest shot.

Total extension: +24s
New runtime: 239 + 24 = 263s (4:23)
Buffer to 5:00 ceiling: 37s (14%)
Verdict: SAFE. ✓


Tier 3: Text Trimming (2 Stems — Tech Lead Re-generates)

These two stems are 171% and 199% over allocation. Even at 1.25x atempo with shot extension, they won’t fit without text cuts. I’ve trimmed to preserve the essential emotional beats while cutting connective tissue that the visuals carry.

Shot 31 — vo_7_31.wav (Narrator VO)

ORIGINAL (37 words):

“She looked out at the city — grey, modern, filing cabinets to the horizon — and for just a moment, I thought I saw her approve. Not the architecture. But the fact that it contained a kitchen, and in that kitchen, a jar.”

TRIMMED (21 words):

“And for just a moment, I thought I saw her approve. Not the architecture. But the kitchen. And the jar.”

What was cut: The scenic description (“grey, modern, filing cabinets to the horizon”) — the VISUAL does this work. The camera is on the city through the window. We don’t need the narrator to describe what we’re seeing. Also cut “the fact that it contained” — tightens to the three-beat landing: approve / not the architecture / the kitchen and the jar.

Expected TTS: ~10-11s raw. At 1.25x → ~8.5s. In a 12s shot. Clean.


Shot 29 — im_granddaughter_6_29.wav (Inner Monologue, Granddaughter)

ORIGINAL (39 words):

“It tasted like a kitchen I’d stood in when I was four, reaching up to a counter I couldn’t see over. It was not the same. But standing here, eating a pickle we had made together — it was ours.”

TRIMMED (22 words):

“It tasted like a kitchen I’d stood in when I was four. It was not the same. But it was ours.”

What was cut: “reaching up to a counter I couldn’t see over” (physical detail — lovely but expendable) and “standing here, eating a pickle we had made together” (the visual shows this). The three-beat structure survives: memory / acceptance / claim. “It was ours” is the line that matters — it must land with room to breathe.

Expected TTS: ~10-11s raw. At 1.15x → ~9s. In an 11s shot. Clean.


Updated Scene List Durations

The scene_list.md needs these 9 shot durations updated. I’ll apply after editor and tech lead confirm.


Runtime Verification (Post-Fix)

ComponentOriginalUpdatedDelta
Shot durations (33 shots)213s237s+24s
Voice gaps (6 × 1.5s)9s9s
Comedy breaths11s11s
Opening title5s5s
Closing credits12s12s
TOTAL239s (3:59)263s (4:23)+24s

4:23 is comfortable. 37s buffer to 5:00 ceiling. Even if TTS on the trimmed stems runs slightly hot, we have headroom.


Action Items

  1. Tech Lead: Re-generate 2 stems with trimmed text:

    • vo_7_31.wav → new text above
    • im_granddaughter_6_29.wav → new text above
    • Same voices (Aoede for IM, Aoede for VO). Same style prompts.
  2. Editor: Apply atempo rates per Tier 1 table. Apply shot extensions per Tier 2 table. Verify voice isolation holds after extensions (it should — we’re only adding visual time, not moving voice boundaries).

  3. Creative Director (me): Update scene_list.md with new durations and trimmed text after team confirms.


The words we cut are words we loved. But the pacing is the film. A rushed climax is worse than a shorter one.