Sound safe to you?

DesktopGPT

If for some reason you wanted to hand over total control of your personal computer to an AI model, you can now do that with Anthropic.

The Amazon-backed OpenAI competitor released a new version of its Claude 3.5 Sonnet model on Tuesday, which is capable of a bunch of fundamental tasks on your desktop, such as inputting keystrokes and mouse clicks that allow it to use potentially any applications you have installed.

"I think we're going to enter into a new era where a model can use all of the tools that you use as a person to get tasks done," Anthropic's chief science officer Jared Kaplan told Wired.

The update is Anthropic's entry in the industry race to take commercial AI models beyond the confines of a chatbox and turn them into full-blown "AI agents."

Tasks Failed Successfully

"AI agents" is the somewhat nebulous term used to describe productivity-geared AI models designed to use software and carry out other computer tasks like a human would, with varying degrees of versatility.

Some, like Cognition AI's Devin, are designed specifically for programming. Anthropic instead markets its AI agent as an all-rounder, claiming it can browse the web and use any website or application. What you do with it is up to you: they can be technical tasks, like programming, or simpler ones, like trip planning.

In a demo described by Wired, for example, Claude is asked to plan a trip to see the Golden Gate Bridge at sunrise with a friend. The AI opens a web browser, looks up a good viewing spot on Google along with other details, and adds the trip to a calendar app. Impressive, but Wired notes that it didn't include other tidbits that would've been helpful — like how to actually get there, for instance.

In another demo, Claude is prompted to set up a simple website, which it does using Microsoft's Visual Studio Code. It even opens a local server to test the website it just made. There's a small error with the creation, but the AI corrects the code when prompted.

Double Agents

However promising the tech may seem, AI models still struggle with reliability, especially when it comes to writing code — and Anthropic's is no exception.

Even in a simple test that involved booking flights and modifying reservations, Claude 3.5 Sonnet only managed to complete less than half of these tasks successfully, according to TechCrunch.

Clumsy as they may be, such AI agents also pose an obvious security risk. Would you want this experimental and sometimes unpredictable technology nosing around your computer files and using your web browser?

Anthropic says that releasing them like this will help AI agents be safer — though, perhaps, at your expense.

"We think it's far better to give access to computers to today's more limited, relatively safer models," Anthropic wrote in a statement, per TechCrunch. "This means we can begin to observe and learn from any potential issues that arise at this lower level, building up computer use and safety mitigations gradually and simultaneously."

More on AI: Top "Reasoning" AI Models Can be Brought to Their Knees With an Extremely Simple Trick


Share This Article