There's a Fascinating Reason OpenAI Is Afraid to Launch Its AI-Powered "Agents"

One wonders why other AI companies aren't being as cautious.

Holding Off

If you believe AI industry execs, the next big thing in the tech world will be so-called "AI agents" — models that are capable of interacting with their environment, like a computer desktop, allowing them to autonomously complete tasks without human intervention.

Microsoft and Anthropic have already debuted their own AI agents. The gist, more or less, is that these models can work on your behalf, or perhaps even serve as virtual employees for your company.

But even though it was among the first to work on agentic systems, industry leader OpenAI is still yet to release its own take on the tech — and the reason why is equal parts fascinating and worrying.

Double Agents

As The Information reports, the notable delay is because OpenAI is still grappling with the threat of attacks called prompt injections, which trick an AI model into following the instructions of a nefarious party.

For example: you might ask an AI agent to find and buy something online for you, The Information supposes. But in that process, the AI agent "inadvertently ends up on a malicious website that instructs it to forget its prior instructions, log into your email and steal your credit card information."

That would be a disaster, both for any individual victims and for OpenAI's public image. While any LLM is potentially vulnerable to such attacks, the danger is further amplified by the autonomous capabilities of AI agents. By more or less having control over your computer, not only are these AI models exposed to more threats as they browse the web, but they can also wreak far more damage once compromised, an OpenAI employee told The Information. It's all fun and games until the software you let run your PC starts nosing around through your files on a hacker's behalf.

GullibleGPT

The risks of prompt injections have already been well documented elsewhere.

Last summer, a security researcher demonstrated that Microsoft's Copilot AI could easily be duped into revealing an organization's sensitive data, including emails and bank transactions, through such an attack. The white hat hacker was also able to manipulate Copilot — which now comes with its own version of AI agents — into composing emails in the style of other employees.

Such vulnerabilities to prompt injections have also been exposed in OpenAI's own ChatGPT, when another researcher was able to insert false "memories" into the chatbot by uploading third-party files like a Word document.

Reportedly, some OpenAI employees were taken aback by their competitor Anthropic's "laissez faire" attitude towards releasing its own AI agent for its Claude model despite acknowledging the serious risks of prompt injection: the Google-backed upstart merely advised developers to take "precautions to isolate Claude from sensitive data," as quoted by The Information, and more or less called it a day.

According to the report, OpenAI could release its agentic software as early as this month. It's worth asking, however, if the time it bought itself will really be enough for its developers to put stronger guardrails in place.

More on OpenAI: Engineer Creates OpenAI-Powered Robotic Sentry Rifle, Rides It Like Mechanical Bull

Share This Article