Google’s Latest AI Interacts With The Web Like An Everyday User

The Gemini 2.5 Computer Use model can use a browser like a person, navigating pages, entering text and retrieving information that APIs cannot provide.

NDTV Profit News

09 Oct 2025, 01:28 PM IST i

It is built on Gemini 2.5 Pro’s ‘visual understanding and reasoning capabilities’. (Photo: Google)

Show Quick Read

Summary is AI Generated. Newsroom Reviewed

Google has introduced Gemini 2.5 Computer Use, an advanced AI model capable of browsing the internet through a browser and handling actions such as completing online forms. It is built on Gemini 2.5 Pro’s “visual understanding and reasoning capabilities”.

“While AI models can interface with software through structured APIs, many digital tasks still require direct interaction with graphical user interfaces, for example, filling and submitting forms. To complete these tasks, agents must navigate web pages and applications just as humans do: by clicking, typing and scrolling,” Google said in a blog post.

At the heart of the system is a ‘computer use’ feature within the Gemini API, designed to run continuously in a loop. It works by taking in a user’s instructions, a snapshot of the current screen, and a log of recent interactions. Developers can restrict certain built‑in interface actions or add custom functions to expand its capabilities.

After processing the provided information, the system determines the next step: usually issuing a function call for a specific interface action like a click or a typed entry. In some cases, it may prompt the user for approval before proceeding, especially for sensitive operations like completing a purchase. The client-side software then carries out the required action.

ALSO READ

Apple, Google Block ICE-Tracking Apps After Trump Administration Demand

Opinion

Apple, Google Block ICE-Tracking Apps After Trump Administration Demand

“The Gemini 2.5 Computer Use model is primarily optimised for web browsers, but also demonstrates strong promise for mobile UI control tasks. It is not yet optimised for desktop OS-level control,” the blog post said.

According to The Verge, the timing of Google’s reveal closely follows OpenAI’s Dev Day, where new ChatGPT apps were showcased alongside continued development of its Agent tool, built to handle complex tasks autonomously. Meanwhile, Anthropic introduced a “computer use” capability in its Claude AI model as far back as last year.

Google shared demonstration clips showcasing how its computer use tool operates, noting that the videos have been accelerated to three times their normal speed.

Google claims, “It outperforms leading alternatives on multiple web and mobile control benchmarks, all with lower latency.”

The Verge adds that, in contrast to ChatGPT’s Agent and Anthropic’s computer use technology, Google’s latest AI is limited to operating within a web browser rather than having control over a full computer system.

Google’s Latest AI Interacts With The Web Like An Everyday User

The Gemini 2.5 Computer Use model can use a browser like a person, navigating pages, entering text and retrieving information that APIs cannot provide.

ALSO READ

Apple, Google Block ICE-Tracking Apps After Trump Administration Demand

ALSO READ

No More Endless Forms: Gemini's Latest Model Can Use Your Computer; Scroll And Type For You

NDTV Profit

NDTV Profit

Follow Us

DISCOVER US FASTER

DOWNLOAD THE APP