Google Generative Media APIs
Comprehensive survey of Nano Banana (image), Veo 3.1 (video), and Lyria 3 (audio) APIs.
Summary
As of May 2026, Google has significantly advanced its suite of developer-accessible generative media tools. Through Vertex AI and Gemini APIs, developers have access to Nano Banana (image series), Veo 3.1 (video), and Lyria 3 (audio). A major theme is multi-modal integration and SynthID watermarking.
1. Image Generation (Nano Banana)
The image generation landscape is transitioning from standalone Imagen to Gemini-integrated "Nano Banana." Models include gemini-2.5-flash-image (Nano Banana), gemini-3.1-flash-image-preview (Nano Banana 2), and gemini-3-pro-image-preview (Nano Banana Pro). Nano Banana 2 introduces Image Search Grounding; Pro features spatial reasoning. Supports text-to-image, conversational editing, and up to native 4K resolution.
2. Video Generation (Veo 3.1)
The Veo model family has matured to version 3.1 — production-ready cinematic pipelines. Models: veo-3.1-generate-001 (Standard/4K), veo-3.1-fast-generate-001 (Optimized speed), veo-3.1-lite-generate-001 (Budget/720p). Base video lengths are 4, 6, or 8 seconds. Using "Scene Extension," developers can chain 7-second extensions up to 20 times for a maximum of 148 seconds (~2.5 minutes). Supports text-to-video, image-to-video, reference-to-video, and native audio generation.
3. Audio & Music (Lyria 3)
Lyria 3 supersedes MusicLM for professional development. Models: lyria-3-pro-preview (full songs up to 184 seconds), lyria-3-clip-preview (30-second loops), and Lyria RealTime (experimental streaming). Supports full song architecture (intros, chorus, bridges), timed vocal lyrics, latent space manipulation, and multimodal prompting (image-to-mood). Output in 44.1/48 kHz stereo.
Open Questions
Does the 148-second Veo extension workflow fully support 4K? What are the limits of custom reference voices in Lyria? Will enterprise clients receive extended Imagen 3 support past the June 30, 2026 deprecation deadline?