eta-editor - Retrospective

What Went Well

Zero reshoots across 37 shots. Every shot passed technical QC on the first take (or after techlead’s own retry). This is exceptional for a generative pipeline. The storyboard-to-video workflow with start/end frame interpolation produced consistent 1280x720, 8s clips.
Timeline-first workflow. Building timeline.json early (while shots were still being generated) meant assembly was instant once all assets landed. The 2m32s render time for 37 clips with crossfades and 3-track audio is fast.
genmedia-assemble timeline command. The timeline JSON format gave precise control over crossfade overlaps, source trimming, per-track ducking, and multi-clip audio positioning. Being able to iterate by editing JSON and re-running is far superior to manual FFmpeg scripting.
Audio ducking worked correctly. Music at -3dB with -12dB sidechain ducking under +3dB voice produced a clean documentary-style mix. The pilot lesson about music drowning out VO was fully addressed.
THE SILENCE editorial beat. Splitting the burp music stem around a 5s dead-air gap at shot 31 created the strongest moment in the film. The EP explicitly called it out as a highlight.
Clean team communication. Status updates were batched, functional, and carried specific information (shot QC results, stem durations needed, timeline stats). No wasted messages.

Late start. I was initialized after Steps 1-4 were already complete. I missed the opportunity to weigh in on the “Pulse” during Step 1 concept selection, the Mathematical Pacing Review during Step 2, and continuity review during Step 4 storyboard. These are defined Editor responsibilities in the playbook.
Music stem duration mismatch. The burp stem was only 30.77s vs. the 38.5s needed. This required restructuring the music section of the timeline. Better to specify exact durations (with overhang) when requesting music generation.
VO stems longer than planned shot durations. 8 of 17 VO stems exceeded their planned shot duration (e.g., vo_shot_01 was 9.36s for a 6s shot). This required using the full 8s video clips without trimming, reducing available transition material. In future, VO should be generated to target duration or shot durations should account for VO length.
No closing credits video card. The mograph agent produced an opening title but no closing credits card. Credits music plays over black, which is acceptable but not ideal.

Stall/resume cycle. I was stalled and resumed during the session, which caused brief confusion about production state. The team had to re-sync on where things stood.
Render manifest stripped original audio. My first attempt at integrating the title card used the render manifest, which replaces the audio pipeline entirely. The resulting file had credits music but lost the film’s VO and score. I caught this during verification (checking RMS levels at known VO sections). The render approach happened to preserve audio through the video track concatenation, but this was fragile — a dedicated concat + combine workflow would be more explicit.
Mograph agent dependency issues. The eta-mograph agent hit apt-get permission errors trying to install dependencies, causing delays. This is an infrastructure issue, not a workflow one.

Used full 8s clips for VO-heavy shots instead of trimming. Alternative: request VO regeneration at shorter duration. Chose to keep the existing VO because re-recording would delay production and the TTS performances were good. The extra video material actually provided better transition handles.
Split burp stem around THE SILENCE. Alternative: let music play quietly through the silence beat. Chose dead air because it maximizes the dramatic contrast of THE BURP payoff. The EP specifically approved this.
Used 0.5s intra-scene and 1.0s inter-scene crossfades. Alternative: uniform crossfade duration. Chose differentiated durations because scene boundaries warrant a more deliberate transition, while intra-scene cuts should feel fluid.
Delivered rough cut before title integration. Alternative: wait for mograph and deliver a single final master. Chose to ship the rough cut immediately so the EP could begin review while titles were still being generated. This parallelized the workflow.
Credits music over black (no credits card). Alternative: request a closing credits video from mograph. Chose to accept black + music because the pilot coach had already given GREEN LIGHT and the coordinator requested the retro, signaling production wrap.

Start the Editor earlier. The Editor should be active from Step 1 to provide the “Pulse” assessment and from Step 2 to own the Mathematical Pacing Review and Musical Arc. Starting at Step 5 means catching up on 4 steps of context.
Lock VO durations to shot plan. When generating TTS, include a target duration parameter or generate multiple takes and select the one closest to the planned shot duration. This prevents the VO-longer-than-clip accommodation problem.
Request music stems with exact durations. Specify “generate a 77-second stem” rather than relying on model defaults. The Lyria Pro model generates ~2:30 clips, which is usually too long, while Lyria Clip generates ~30s, which was too short for the burp section.
Pre-build a closing credits template. Have the mograph agent produce both opening title and closing credits cards from the start. This avoids the scramble at the end.
Add audio verification to the QC pipeline. The Takes Protocol checked resolution, duration, codec, and audio presence — but not audio content quality (e.g., checking that Veo-generated ambient audio matched the prompt). A spectral or loudness check would catch issues earlier.
Document the render vs. timeline vs. concat decision tree. The three assembly approaches (render manifest, timeline JSON, manual concat+combine) have different tradeoffs. A quick decision matrix in the playbook would prevent the audio-stripping issue I hit.