xAI has officially launched Grok Imagine Video 1.5, bringing its dedicated image-to-video and text-to-video generation model out of preview.
The update is now generally available via the xAI API (as grok-imagine-video-1.5) and has been rolled out as Video 1.5 Fast on the web interface, iOS, and Android apps. It marks a major structural step forward from version 1.0, immediately topping the LMSYS Image-to-Video Arena leaderboard upon its release.
Technical Specifications & Performance Leap
The engine driving the update is xAI’s Aurora autoregressive model, trained on a massive cluster of 110,000 Nvidia GB200 GPUs. The architecture processes clips sequentially from the first frame forward, allowing each frame to directly inform the next to significantly stabilize motion coherence.
- Resolution Options: 480p (optimized for fast drafts) and 720p (for final cinematic output).
- Frame Rate & Length: 24 frames per second (fps), with clip durations scaling from 6 to 15 seconds.
- The “Fast” Speed Metric: The 1.5 Fast tier cuts generation latency by roughly 40%. It can render a 6-second, 720p video in approximately 25 seconds, down from the 40+ seconds required by the previous generation.
The Core Upgrades: What’s New in 1.5?
The model introduces severe upgrades across physical rendering, audio pipeline execution, and platform utility.
1. Native Multimodal Audio (Single-Pass Generation)
Unlike traditional AI video tools that require you to stitch sound effects or dialogue in post-production, Grok Imagine Video 1.5 generates synchronized audio natively in the exact same pass as the video. The audio engine tracks visual cues to automatically apply:
- Highly accurate lip-synced character dialogue (including contextual accents).
- Ambient environmental sounds (like rainfall, room tone, or wind).
- Sound effects precisely timed to on-screen actions (like a blade whooshing or heavy footsteps).
2. Upgraded Physics & Motion Continuity
Version 1.5 drastically reduces the unnatural visual warping and morphing that usually plagues AI video clips. It handles complex visual interactions—like fluid dynamics, rising steam, micro-expressions, and light refracting through translucent glass—with believable weight and momentum.
3. Reduced Loss Across Extensions
When filmmakers and content creators chain multiple clips together to extend a scene, AI models often suffer from extreme quality degradation. The 1.5 engine maintains strict fidelity, lighting profiles, and subject details across extended generations, preventing the video from getting grainy or oversaturated over time.
Creative Workflow Tools
Alongside the underlying model upgrade, xAI has modified the user interface with a suite of productivity tools built directly into the creative loop:
- Projects: A new sidebar feature that allows users to organize distinct clips, assets, and generations into standalone folders.
- Parallel Agents: Users can now kick off multiple prompt generation agents simultaneously, allowing several video concepts to render in parallel rather than waiting for one clip to finish before starting the next.
- Semantic Library Search: An AI-backed asset search tool that allows creators to find older generated clips or source images by describing them in plain language, eliminating the need to endlessly scroll through history files.
