Google now lets you direct avatars through prompts in its Vids app

Jonathan Johnson/Bloomberg / Getty Images
A simple text prompt is all it takes for a digital avatar in the Google Vids app to interact with a product, grab a specific tool, or perform a scripted scene. Google has introduced support for the advanced Veo 3.1 model to its video editor, allowing users to direct virtual characters using natural language. The most significant breakthrough is the maintenance of full character consistency—despite dynamic changes in the frame and interactions with the environment, the avatar's appearance remains unchanged, which has previously been a major challenge in generative video. The update also introduces practical improvements, such as a dedicated Chrome browser extension to facilitate recording and a direct export function to YouTube. For creative and business users, this means a drastic reduction in production time—creating professional product presentations or training materials no longer requires hiring actors or complex post-production. Google Vids is thus becoming a complete cloud-based film studio, where the line between simple editing and advanced AI animation almost entirely disappears, putting tools previously reserved for professional VFX studios into the hands of amateurs.
Google is taking another step toward the complete automation of the video creation process for business. On Thursday, the Mountain View giant announced a significant expansion of the capabilities of its Vids app, introducing features that just a year ago seemed the domain of advanced post-production studios. The most important innovation is a system for directing virtual avatars using natural language prompts, allowing for precise control over their behavior on screen without the need for keyframe animation or technical skills.
This is not just another interface update. The integration of the Veo 3.1 model with the Vids ecosystem means that Google is striving to create a comprehensive environment where only a few lines of text separate an idea from a finished video. The application, which is part of the Workspace suite, is thus becoming a testing ground for the most advanced generative video technologies the company currently has in its portfolio.
Directing a digital actor with text
The key innovation in Vids is the ability to instruct avatars as if they were live actors on a film set. Users can now type commands that specify how the character should interact with their surroundings. We are not talking about static script reading here, but dynamic action within a scene. It is possible to command the avatar to interact with a specific product, grab a prop, or operate specialized equipment presented in the footage.
Read also
The biggest challenge in generative video has always been character consistency. Google claims that despite the dynamic nature of the generated scenes, Vids can maintain a consistent appearance and characteristic traits of the avatar throughout the duration of the film. This is critical for brands that want to build recognizable visual communication without risking their digital ambassador changing facial features halfway through a presentation.

Veo 3.1 and a new era of video efficiency
The introduction of support for the Veo 3.1 model is a signal that Google does not intend to cede ground to competitors like OpenAI or Runway. The new version of the model offers significantly better interpretation of complex prompts and higher visual quality of generated frames. In a business context, this means that training videos, sales presentations, or internal communications can look professional with minimal human effort.
In addition to advanced visual algorithms, Google has taken care of practical aspects of the workflow. Users have gained the ability to record material using a dedicated Chrome browser extension, making it easier to quickly capture the screen and weave it into the structure of the project being created. This tool perfectly fits the needs of remote teams that need to quickly create "how-to" tutorials.
- Directing avatars: Controlling character movement and interactions using text commands.
- Veo 3.1 support: Utilizing the latest generation of Google video models for better quality and realism.
- YouTube Export: Direct upload of finished projects to the video platform.
- Chrome Extension: Easy content recording and quick editing inside the application.

The ecosystem closes the production loop
The introduction of a direct video export option to YouTube is a strategic move. Google Vids is ceasing to be just a tool for creating "slides with a voiceover" and is becoming a full-fledged video editor that competes with simple online tools. Eliminating the need to download files to a drive and manually upload them significantly shortens the time needed to publish content, which is a key asset in today's pace of digital marketing.
It is worth noting how Google is positioning Vids. It is not intended to be a tool for professional editors, but for "knowledge workers" who need visual communication without the entry barrier of complicated software. The ability to personalize avatars means that every company can create its own database of virtual presenters tailored to the specifics of the industry or organizational culture.
"Vids maintains character consistency despite the dynamic nature of the generated scenes, allowing for the creation of professional interactions with products and equipment without involving a film crew."
Democratization of video production in corporations
The way Google is developing Vids suggests that the future of corporate communication will not be based on text, but on short, AI-generated video forms. The ability to "direct" avatars is a breakthrough that solves the problem of the rigidity of previous solutions. If a user can ask the system for the avatar to "point to the chart on the left" or "pick up the new smartphone model," it eliminates the need for expensive recording sessions and lengthy post-production.
This approach puts Google in a unique position. By possessing data from Workspace, the computing power of Veo models, and the reach of YouTube, the company is creating a closed loop of content production. The line between a simple presentation and a professional video is blurring before our eyes, and the only limitation is becoming the precision of the prompt we enter into the editor window. In the near future, we can expect the role of "prompt engineer" in marketing departments to evolve toward a "digital director" who, instead of setting up lights on a set, will optimize the behavior parameters of virtual characters.








