Home Technology Artificial Intelligence Google release ‘Gemini 2.5 Computer Use model’

Google release ‘Gemini 2.5 Computer Use model’

0

Google DeepMind has unveiled the Gemini 2.5 Computer Use model, a specialized AI system built on Gemini 2.5 Pro’s visual understanding and reasoning capabilities, designed to power agents that interact with user interfaces (UIs) as naturally as humans. Announced on October 7, 2025, the model is now available in public preview through the Gemini API on Google AI Studio and Vertex AI, allowing developers to build assistants for web browsing, mobile control, and workflow automation. For AI developers, automation experts, and enterprise innovators searching Gemini 2.5 Computer Use model, Google AI UI interaction, or Gemini API preview 2025, this release outperforms leading alternatives like Claude and OpenAI’s tools on web and mobile benchmarks, with lower latency and 13 supported actions including clicking, typing, scrolling, and drag/drop. Optimized for browser tasks, it shows promise for mobile UI control and is used internally for software testing, with an early access program for third-party developers.

Gemini 2.5 Computer Use represents Google’s push toward agentic AI, where models act autonomously in digital environments.

Gemini 2.5 Computer Use: Capabilities and How It Works

The model analyzes screenshots and URLs to interpret user requests, generating UI actions like a function call, which client-side code executes before looping back with new inputs. This enables tasks such as searching, form filling, or navigating sites without APIs.

  • Supported Actions: 13 total, including open browser, type text, click, hover, scroll, drag/drop, back/forward, search, navigate URL, and keyboard shortcuts.
  • Benchmark Performance: Leads in web/mobile control tasks with low latency; strong on AndroidWorld for mobile UI.
  • Input/Output Loop: User prompt → model analyzes screenshot/URL → action call → execution → new screenshot/URL → repeat.
  • Limitations: Browser-focused; not optimized for desktop OS control yet.

Demos (sped up 3x) showed it playing 2048 or browsing Hacker News, with natural handling of dynamic content.

Action TypeExamplesUse Case
NavigationOpen URL, Back/ForwardWeb Research
InputType Text, Keyboard ShortcutsForm Filling
InteractionClick, Hover, Drag/Drop, ScrollDynamic Sites
SearchWeb SearchInformation Retrieval

Availability: Public Preview via Gemini API

Developers can access Gemini 2.5 Computer Use immediately in public preview on Google AI Studio and Vertex AI, with feedback encouraged via the Developer Forum.

  • Model ID: gemini-2.5-computer-use-preview.
  • Pricing: Standard Gemini 2.5 Pro rates; no additional costs for preview.
  • Integration: API for building agents; early access for automation tools.

Google plans expansions to desktop and further mobile optimization.

Implications: Agentic AI for Everyday Automation

This model advances Google’s Project Mariner and AI Mode, enabling real-world agents for tasks like e-commerce or testing.

  • Developer Benefits: Faster prototyping with low-latency UI control.
  • Enterprise Use: Workflow automation, reducing manual interventions.
  • Ethical Focus: Built with safety measures for responsible interactions.

As DeepMind’s blog notes: “It’s a step toward agents that interact with UIs as humans do.”

Conclusion: Gemini 2.5 Computer Use’s UI Frontier

Google’s Gemini 2.5 Computer Use model launch on October 7, 2025, empowers AI agents for browser and mobile navigation, outperforming rivals in benchmarks. Available in preview via API, it’s a developer milestone. For automation, the interfaces await—will agents redefine digital tasks? The screens respond. verge

NO COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Exit mobile version