Google has introduced Gemini 2.5 Computer Use, an advanced AI model capable of browsing the internet through a browser and handling actions such as completing online forms. It is built on Gemini 2.5 Pro’s “visual understanding and reasoning capabilities”.

“While AI models can interface with software through structured APIs, many digital tasks still require direct interaction with graphical user interfaces, for example, filling and submitting forms. To complete these tasks, agents must navigate web pages and applications just as humans do: by clicking, typing and scrolling,” Google said in a blog post.

At the heart of the system is a ‘computer use’ feature within the Gemini API, designed to run continuously in a loop. It works by taking in a user’s instructions, a snapshot of the current screen, and a log of recent interactions. Developers can restrict certain built‑in interface actions or add custom functions to expand its capabilities.

After processing the provided information, the system determines the next step: usually issuing a function call for a specific interface action like a click or a typed entry. In some cases, it may prompt the user for approval before proceeding, especially for sensitive operations like completing a purchase. The client-side software then carries out the required action.