The vision of a digital assistant that relieves us of the tedious task of clicking through apps has been present in the promotional materials of tech giants for years, but reality has rarely lived up to the promises. What Siri or Google Assistant offered for a decade was more a set of simple voice scripts than autonomous action. However, the premiere of the new task automation in Gemini, tested on the flagship Pixel 10 Pro and Samsung Galaxy S26 Ultra, marks a turning point. Although the system is currently in beta and can be frustratingly slow, for the first time we are dealing with technology that actually takes the reins of the smartphone interface.

When AI takes control of the screen

The new Gemini feature is not just about generating text or summarizing emails. It is an attempt to create an AI Agent — software that understands the structure of mobile applications designed for humans and can navigate through them. In practice, this looks like the user issuing one general command, and Gemini begins "clicking" on our behalf. Currently, the system supports a limited number of services, focusing mainly on food delivery and transportation, such as Uber or Uber Eats.

Autonomous navigation: The AI can independently scroll through menus, add products to the cart, and select delivery options.
On-the-fly reasoning: The system demonstrates surprising logic — for example, when a menu only offers a "half portion," Gemini can add two items to fulfill an order for a full meal.
Background work: Automation does not require constant user attention; the process can take place while we are doing something else, which is a key advantage over manual data entry.

Despite these advantages, the process is far from instantaneous. Ordering dinner, which takes a human two minutes, can take Gemini up to nine minutes. The system "thinks" about every step, analyzes the screen content, and sometimes gets lost in a maze of buttons, which resembles watching a novice smartphone user struggling to find the right icons.

The human interface barrier

The biggest challenge for Gemini is not a lack of computing power, but the fact that today's applications are optimized for the human eye and finger, not for AI algorithms. Pop-up ads, complex graphical layouts, or ambiguous dish naming (e.g., "set" instead of "plate") are traps that Gemini falls into regularly. Watching an AI model try to locate an appetizer that is right in the middle of the screen can be a painful experience for the observer.

This is a fundamental paradox: we are forcing the world's most advanced language models to interpret interfaces that are completely unnatural to them. AI doesn't need buttons, high-resolution photos, or promotional banners — it needs clean data.

Google's current approach, based on pure visual reasoning (reasoning approach), is treated as a temporary solution. The industry is moving toward standards such as the Model Context Protocol (MCP) or Android App Functions. These are intended to allow applications to share their functions directly with AI models, bypassing the visual layer. Until this happens, Gemini will be condemned to laboriously "clicking" through pixels, which will always generate delays and errors.

Context that changes the rules of the game

However, the true power of Gemini reveals itself when artificial intelligence connects the dots between different Google services. In scenario tests, the AI showed impressive initiative in travel planning. With only general flight information saved in the calendar, Gemini was able to independently check the departure time from an email, calculate the optimal travel time to the airport taking into account the user's location, and propose booking an Uber for a specific time.

This is precisely where the difference between old assistants and the new generation lies. Traditional systems required precise commands ("Book an Uber for 11:30"). Gemini understands the intent ("Get me to my flight tomorrow on time") and performs the analytical work itself. The fact that the system distinguishes between colloquial terms and official names in app menus makes the barrier between natural language and code almost invisible.

Halfway to AI agents

Google applies a safety switch in this case: automation stops just before the final payment button. The user must ultimately approve the transaction, which is the only sensible solution in the beta phase. Although the system rarely "goes rogue" and usually configures orders correctly, it does make mistakes resulting from a lack of access to location data or app permissions, requiring manual intervention in the first minutes of a task.

Despite its sluggishness, the new feature in the Pixel 10 Pro and Galaxy S26 Ultra is more than just a technological curiosity. It is proof that the operating system of the future will not be based on app icons that we have to open, but on a layer of an intelligent intermediary. Gemini's current slowness is the price for learning to navigate a world designed for humans.

One could venture to say that we are on the threshold of an era where the smartphone ceases to be a tool we operate and becomes a coordinator of our needs. Today's nine-minute wait for an AI to order a pizza is just a transitional stage. The moment developers start adapting their apps to MCP standards, these same operations will take seconds, and the user's role will be limited only to stating a wish and authorizing the payment with biometrics.

Gemini task automation is slow, clunky, and super impressive

When AI takes control of the screen

Read also

The human interface barrier

Context that changes the rules of the game

Halfway to AI agents

More from AI

Cisco CEO Chuck Robbins wants data centers in space

How to use the new ChatGPT app integrations, including DoorDash, Spotify, Uber, and others

Spain’s Xoople raises $130 million Series B to map the Earth for AI

Copilot is ‘for entertainment purposes only,’ according to Microsoft’s terms of use

Related Articles

“The problem is Sam Altman”: OpenAI Insiders don’t trust CEO

Google quietly launched an AI dictation app that works offline

Iran threatens ‘Stargate’ AI data centers

Iran threatens OpenAI’s Stargate data center in Abu Dhabi

Comments