📚 New to this topic? Read our full guide: What Is Artificial Intelligence?.

Google Gemini 3.5 Flash Can Now See and Control Your Screen

Google has put a new power inside its Gemini 3.5 Flash AI model. The model can now look at a computer screen, understand what is on it, and click, type and scroll on its own. Google calls this “Computer Use.” In simple words, the AI can now operate a browser, a phone app or a desktop the way a person would. This makes Gemini 3.5 Flash an “agentic” model. (Agentic means the AI does not just answer questions; it takes actions step by step to finish a task for you.)

Until now this screen-control skill lived in a separate, older Gemini 2.5 model. Building it straight into Gemini 3.5 Flash matters because Flash is Google’s fast and cheap model. So developers get screen control without paying for a heavier model.

What “computer control” actually means

Most AI chatbots can only talk. This one can act. You give Gemini 3.5 Flash a goal, and it studies the screen, decides where to click, and does the work. Google says the model can “see, understand, and interact with computers, browsers, and mobile devices on its own.”

That opens up real jobs. A developer could ask the model to test a website by clicking through every button. A business could ask it to fill forms or move data between apps. The model works across browser, mobile and desktop.

How good is it? The benchmark scores

Google measured the model on OSWorld. This is a benchmark (a standard test) that checks how well an AI can finish real tasks on a computer desktop. A higher score is better. Here is how Gemini 3.5 Flash compares to rival models on the reported OSWorld scores.

Model	OSWorld score (computer control)
Anthropic Opus 4.8	83.4
GPT-5.5	78.7
Gemini 3.5 Flash	78.4
Sonnet 4.6	78.4
Gemini 3.1 Pro	76.2
GPT-5.4 mini	72.1
Gemini 3 Flash	65.1

Reported OSWorld scores. What it means: Gemini 3.5 Flash scores 78.4, a big jump from the older Gemini 3 Flash (65.1). It matches Anthropic’s mid-tier Sonnet 4.6 and trails only the top heavyweight models, while being a fast, low-cost model.

Benchmarks & specs at a glance

Spec	Detail (as reported)
Model	Gemini 3.5 Flash
New skill	Computer Use (screen control) built in
OSWorld score	78.4
Works on	Browser, mobile, desktop
Where to get it	Gemini API, Gemini Enterprise Agent Platform
Extras	Browserbase demo, GitHub reference code

Figures and details as reported by The Decoder/Google.

What about safety?

Letting an AI click on its own is risky. So Google added guards. The company used “adversarial training” (training the model against tricky, harmful inputs) and gave businesses two optional safety switches.

The first switch asks a human to confirm before the AI does anything sensitive, like a payment. The second one stops the task automatically if the model spots a “prompt injection.” (A prompt injection is a hidden trick on a webpage that tries to fool the AI into doing the wrong thing.) Google also tells companies to run the AI in a safe sandbox and keep human oversight.

Where you can use it

Developers can access the feature through the Gemini API and the Gemini Enterprise Agent Platform. Google also shared a live Browserbase demo and a reference build on GitHub so coders can start fast. This release fits a wider push by Google into enterprise AI agents, similar to how Indian IT firms are now building AI agents on Gemini Enterprise.

FAQ

What is Gemini 3.5 Flash?

It is Google’s fast, low-cost AI model. The new version can also see and control a computer screen on its own.

What is OSWorld?

It is a test that scores how well an AI finishes real tasks on a desktop. Gemini 3.5 Flash scored 78.4 out of 100.

Is it safe to let AI control my screen?

Google added safety switches, like asking a human to confirm risky actions and stopping if it spots a hidden trick. But experts still suggest running it in a controlled space with human checks.

Why it matters (especially for India and founders)

India runs on services and software work. A fast, cheap AI that can operate apps could change how startups build products. A small team could automate software testing, data entry or customer tasks without hiring many people. Because Flash is cheap, even bootstrapped Indian founders can try it. This is also part of a bigger race, with India leading global demand for AI agents and assistants. The companies that learn to use screen-controlling AI early could move much faster than rivals.

The takeaway is simple. AI is moving from talking to doing. Gemini 3.5 Flash now sees your screen and acts on it, at a low cost. The big question for every founder is no longer “can AI answer this?” but “what work can I now hand to an AI agent?”

Source: The Decoder.

Related coverage

Gemini release ‘Computer Use’ feature

Google Gemini 3.5 Flash Can Now See and Control Your Screen

What “computer control” actually means

How good is it? The benchmark scores

Benchmarks & specs at a glance

What about safety?

Where you can use it

FAQ

What is Gemini 3.5 Flash?

What is OSWorld?

Is it safe to let AI control my screen?

Why it matters (especially for India and founders)

Related coverage

Related Stories

Anthropic’s Claude Is Winning Paying Users in a Market ChatGPT Owns

OpenAI’s GPT-5.6 Now Needs US Government Approval for Each Customer

UPSC Uses AI Face Checks for the First Time: 569 Candidates Blocked Before Prelims 2026

Leave a Comment Cancel reply