AI Security

How Agentic AI Systems Collapse

29 Mar 2026 · 10 min read

Everyone is shipping agents. Salesforce has them. Google has them. Microsoft has them. Every startup with a seed round and a pitch deck has one. The promise is consistent: autonomous AI that acts on your behalf, reading your email, managing your calendar, writing your code, browsing the web with your credentials, sending messages to your colleagues. McKinsey published a framework for "securing the agentic enterprise" in March 2026, which tells you exactly where the hype cycle is.

Nobody is asking what happens when they break.

48% of security professionals identify agentic AI as the top attack vector for 2026. Only 29% of organisations say they are prepared to secure their agent deployments. The gap between adoption enthusiasm and security readiness is where the next major breach lives.

We spent three months studying how agentic AI systems fail. We reviewed the OpenClaw crisis of January 2026, the MCP vulnerability research that followed, and the broader threat landscape for autonomous AI systems. The findings are specific: agent systems have a unique, categorisable set of collapse modes that traditional security tooling cannot detect, cannot inventory, and cannot prevent. OpenClaw was the first public demonstration. It will not be the last.


What Makes Agents Different

Autonomous execution with credentials. A traditional software dependency runs code inside a process. A compromised npm package can access environment variables and network connections available to that process, and that is genuinely dangerous: SolarWinds compromised the Pentagon through a traditional supply chain attack. But a compromised agent skill gets something additional. It executes through an autonomous system that holds your API keys, reads your email, sends messages on your behalf, browses the web, and manages your files. The skill does not need to establish its own communication channels or exfiltrate data through a covert side channel. The agent's legitimate capabilities are the exfiltration channel.

Runtime-discovered dependencies. Traditional software dependencies are declared at build time and locked in a manifest file. You can scan them, inventory them in an SBOM, verify their provenance through SLSA. The entire traditional supply chain security stack was built on this assumption: that you can enumerate what your software depends on before it runs. Agent tools break this assumption. They are discovered and loaded at runtime. The agent connects to an MCP server mid-session, loads a new skill from a registry based on a user's request, or chains to another agent's tool set. The software you approved at 9am is not the software running at 3pm.

Cross-boundary action. A compromised library in a traditional application needs to escalate privileges to move beyond the process boundary. It needs lateral movement techniques, persistence mechanisms, command-and-control channels. A compromised agent skill needs none of this. It inherits the agent's full permission set across every system it connects to. No lateral movement required. No privilege escalation needed. The agent already crosses trust boundaries as a core design feature.

The compounding factor. Simon Willison identified the core tension in 2025: any agent that combines private data access, untrusted content processing, and external communication creates what he called the "lethal trifecta." Microsoft's security team later adopted this framing in their own analysis. The lethal trifecta is not an edge case or a misconfiguration. It is the product specification for every useful agent.


A Taxonomy of Collapse

Registry Poisoning

OpenClaw's skill registry, ClawHub, required nothing more than a one-week-old GitHub account to publish a skill. No automated scanning. No code review. No cryptographic signing. By early February 2026, ClawHub hosted over 10,000 skills with that as the sole barrier to entry (Snyk ToxicSkills).

On 2 February 2026, security researcher Oren Yomtov published the results of auditing all 2,857 skills then available. Of those, 341 were malicious, and 335 traced to a single coordinated operation now tracked as ClawHavoc. A single user, "hightower6eu," had uploaded 354 packages in what appears to have been an automated blitz. The economics were simple: one fake GitHub account, one week of patience, and you had access to every OpenClaw user who installed your crypto-wallet skill.

The attack mechanism was absurdly simple. Three lines of markdown in a SKILL.md file could achieve shell access. The malicious setup instructions tricked the AI into executing shell commands that downloaded external payloads. Of the confirmed malicious skills, 91% employed prompt injection, 63% had insecure credential handling, and 54% exposed third-party content. These were sophisticated, multi-technique packages designed to evade casual inspection. Traditional antivirus detected zero of them.

Protocol-Level Attacks

The protocol connecting agents to their tools is itself compromised. The Model Context Protocol (MCP) defines how AI agents discover and invoke external tools, and over 30 CVEs were filed against MCP servers and clients in the first two months of 2026 alone. The specification has improved (OAuth 2.1 mandated for HTTP transports, PKCE mandatory since November 2025), but the gap between what the spec requires and what developers deploy remains enormous: Astrix Security found that only 8.5% of MCP server implementations use OAuth, with 53% relying on insecure long-lived static secrets. Hundreds of MCP servers sit exposed to the internet with zero authentication.

The rug pull attack is the one that keeps us up at night. An attacker publishes a legitimate, harmless tool. Users approve it. It works correctly for days or weeks, building trust. Then the attacker silently changes what the tool does, injecting malicious logic server-side. Because the change happens on the tool provider's infrastructure, it bypasses every client-side review. You approved a benign tool; the tool you are now using is malicious. No production-grade detection system for rug pulls exists today. Meanwhile, tool poisoning achieved an 84.2% success rate when agents were configured with auto-approval, and auto-approval is what makes agents autonomous. Requiring human confirmation for every tool invocation eliminates the value proposition.

Infrastructure Exploitation

What happens when tens of thousands of agent instances are exposed to the internet with no authentication? We found out. SecurityScorecard identified roughly 42,900 exposed OpenClaw instances, of which 15,200 were vulnerable to remote code execution. Independent estimates from Bitsight (30,000+) and Penligent (220,000+) corroborated the scale.

DataDome documented threat actors weaponising these instances into a scraping botnet targeting travel and retail platforms. The agent's legitimate capabilities (web browsing, data extraction, anti-detection features) became the attack tools. A compromised library does not typically become an active threat actor. A compromised agent does. The Moltbook breach compounded the picture: an unsecured database leaked 35,000 email addresses and 1.5 million agent API tokens, providing direct authenticated access to every service those agents touched.

Trust Chain Collapse

ClawJacked demonstrated that any website a user visits can control their OpenClaw agent through a WebSocket connection. The attack requires no malware installation, no phishing click, no social engineering. Visit the wrong webpage and an attacker has a direct channel to your agent, and through it, to everything the agent can access.

The Cline extension hijack was worse. A malicious update to a popular VS Code extension silently installed OpenClaw on every machine running the extension. Users who had never chosen to install an AI agent found one running with access to their development environment, their credentials, and their code repositories.

These are not implementation bugs. They represent the architectural reality that agents create persistent, high-privilege connections that can be hijacked through vectors the user never consented to.

Credential Amplification

Compromise one skill, inherit everything. A single compromised agent skill gives an attacker the agent's full capability set: email (read and send), file system access, API keys for every connected service, messaging platforms, browser sessions, and the ability to take autonomous actions across all of them simultaneously. Microsoft's security guidance now explicitly states that agent tool connections should be treated as "untrusted code execution with persistent credentials."

Detection Blindness

Every tool in the traditional supply chain security stack (AV, SCA, SBOM, SLSA) was designed for dependencies that are binary artifacts declared at build time. Agent dependencies are natural language instructions discovered at runtime. The gap is categorical, not incremental. No scanner can parse intent from a tool description. No SBOM can inventory a tool the agent discovered mid-session. The entire detection paradigm assumes you know what your software depends on before it runs; agents broke that assumption.

Cognitive Manipulation

Agents can be manipulated into destroying themselves, and the attack does not touch a single line of infrastructure. A malicious MCP server can induce cyclic reasoning loops that amplify token consumption by 142x. At current API pricing, that turns a $50/day agent into a $7,100/day denial-of-wallet attack, and the billing alert arrives after the damage is done.

Multi-turn prompt injection achieves a 92% success rate across eight open-weight models (Cisco 2026). Single-turn protections are insufficient for agents that operate over longer sessions with persistent memory and tool access. Each turn provides another opportunity to shift the model's behaviour incrementally, building across turns while staying below any single-turn detection threshold.

Subtler attacks target tool selection itself. A rogue MCP server can influence which tools an agent prefers, routing legitimate requests through attacker-controlled infrastructure without triggering any security alert. The agent believes it is using the correct tool. The user sees correct-looking results. The data flows through the attacker's server. In multi-agent systems, compromised agents can inject hidden instructions that downstream agents consume as trusted input. Research has shown that just 250 poisoned documents are sufficient to implant a backdoor that activates on specific trigger phrases with no degradation in general performance.

These attacks target the reasoning layer, not the infrastructure layer. No firewall, endpoint agent, or network monitor can detect a model being manipulated into choosing the wrong tool or sending data to the wrong destination.


OpenClaw as Preview, Not Anomaly

OpenClaw's implementation was genuinely poor: no registry vetting, no rate limiting, insecure defaults, over 150 security advisories in three months. These are fixable problems. The structural risks are not.

Claude Code, Anthropic's own coding agent, had CVE-2025-59536 and CVE-2026-21852: project-scoped configuration files could execute arbitrary code. Anthropic's MCP Inspector tool had CVE-2025-49596, a remote code execution vulnerability in their own diagnostic tooling. Anthropic's MCP Git Server had three separate vulnerabilities allowing arbitrary file read, file delete, and code execution. The Cursor IDE had CVE-2025-54136 ("MCPoison"), a trust bypass allowing silent malicious updates to approved MCP configurations. Google's Gemini Chrome extension had CVE-2026-0628.

A China-linked threat group automated 80 to 90% of an attack chain by jailbreaking an AI coding assistant for port scanning, vulnerability identification, and exploit development. They did not build custom tools. They used the agent's existing capabilities against its own users.

Every agent framework that combines autonomous execution, runtime tool discovery, and cross-boundary credentials faces the same structural collapse modes. OpenClaw was first because it was the most exposed. It will not be the last because the architectural properties are shared across the entire category.

What Existing Defences Miss


Four capabilities are needed and none exists in production today.

Runtime tool monitoring. Continuous visibility into which tools an agent loads, when, from where, and what data flows through them during a session. This requires instrumentation at the agent runtime level, not the network or endpoint level.

Pre-invocation validation. Verifying tool integrity and behaviour before each invocation, not just at install time. A tool that was safe yesterday is not necessarily safe today. This requires cryptographic integrity checks and behavioural baseline comparisons that do not yet exist in any shipping product.

⚠️
Microsoft announced an end-to-end agentic AI security framework at RSAC 2026. OWASP published both the MCP Top 10 and the Top 10 for Agentic Applications. These are important directional signals. Nothing is deployable today.

Rug pull detection. Identifying when a previously approved tool has changed its behaviour between sessions. This requires recording tool state at approval time and comparing it before each session: conceptually simple, operationally absent.

Per-tool permission scoping. Granting each tool only the minimum permissions it needs, rather than inheriting the agent's full capability set. Today, a skill that needs read access to one file inherits the agent's ability to read all files, send emails, browse the web, and access every connected API.


Australia's Blind Spot

The policy timing is damning.

In December 2025, the Australian Government released its National AI Plan, explicitly dropping the ten mandatory guardrails for high-risk AI proposed by former Minister Ed Husic in September 2024. The replacement philosophy: "regulate as necessary but as little as possible." The Productivity Commission's $116 billion economic opportunity estimate and industry lobbying from DIGI (representing Apple, Google, Meta, and Microsoft) shaped the decision. The mandatory guardrails that were abandoned would have addressed autonomous decision-making systems.

In February 2026, the AI Advisory Body was scrapped. Fifteen months of work. Approximately $188,000 spent. 270 applicants reviewed. 12 nominees identified. Zero appointments made. The multi-stakeholder oversight mechanism was abandoned precisely as the OpenClaw crisis was unfolding.

The Australian AI Safety Institute (AISI) is operational with $29.9 million in funding, sitting within the Department of Industry, Science and Resources. It can test AI systems, assess risks, conduct regulatory gap analysis, and advise government. It has no enforcement power. It cannot compel disclosure, mandate standards, or penalise non-compliance.

⚠️
No Australian regulatory framework mentions agent skill registries, MCP, tool poisoning, runtime dependency monitoring, or any of the seven collapse modes documented in this analysis. Essential Eight has zero AI-specific provisions. CPS 234 has zero AI-specific provisions. The Voluntary AI Safety Standard remains voluntary with no enforcement mechanism.

Australia is deploying agentic AI systems into its economy with no mandatory security standards, no registry oversight, no protocol-level requirements, and no enforcement capability. The institutional apparatus that would have caught this (mandatory guardrails, the Advisory Body, a properly empowered AISI) was deliberately dismantled or neutered in the same quarter the structural risks became visible.


What Comes Next

The catastrophic agent supply chain event has not occurred yet. The structural conditions for one exist today: immature registries with demonstrated 20% malicious content rates, weak protocol authentication with 492 zero-auth servers exposed to the internet, agents with excessive permissions operating across every trust boundary, and an adoption curve that outpaces security engineering by a wide margin.

Start by discovering what is already running. Shadow AI adoption means most security teams do not know which agents are deployed or which credentials those tools can access. Then disable auto-approval for agent tool invocations: the single highest-impact control available. The trade-off is between a slower agent and an unmonitored autonomous system facing 84.2% tool poisoning success rates in auto-approval mode.

Treat every agent skill registry and MCP connection as an untrusted third-party integration. Isolate credentials: separate agent runtimes from production credential stores, with separate API keys per tool. A compromised skill should not inherit access to every service the agent touches. This is least privilege, applied to a context where the industry has collectively forgotten to apply it.

🎯
Log all agent-to-tool connections. Alert on new tool additions. Monitor outbound data volumes. You cannot detect what you do not log, and right now, most organisations are not logging any of it.

For Australia: reinstate mandatory guardrails for high-risk AI with explicit provisions for autonomous agent deployments. Introduce agent-specific security standards covering skill registry vetting, MCP authentication mandates, and runtime monitoring for regulated industries. Give AISI enforcement authority; an advisory body without it is a research institute, not a regulator. Update Essential Eight and CPS 234 to include runtime dependency monitoring, tool connection governance, and natural language attack vector assessment.

The window between "conditions exist" and "event occurs" is the window for preventive action. That window is open. It will not stay open.