Who is responsible when your AI agent misbehaves?
Plus: We’re planning another privacy-focused event!
Hello again! Sorry we took so long to get back to you—we’ve been busy planning another event! More on that below, but first we have a few thoughts on the new questions arising around the rapid development and deployment of AI agents.
When AI agents go rogue, who is to blame?
By Mike Orcutt
Who is responsible when an AI system not intended to do bad things does a bad thing anyway? We ought not to dismiss this as a “doomer” question—especially when it comes to agents.
Just ask Roman Storm and Alexey Pertsev. They helped build the Tornado Cash blockchain software, which uses zero-knowledge cryptography to conceal otherwise public blockchain transactions. The system works via smart contracts, and operates independently of human control. When North Korean hackers began using it in 2022 to launder stolen cryptocurrency, Storm and Pertsev didn’t have the power to stop the program from running. Law enforcement held them responsible anyway, and they were convicted of financial crimes and sentenced to prison. (Storm was tried on three different charges, but only convicted on one—conspiring to operate an unlicensed money transmitter. The US government has said it wants to try him again for the two charges that ended in a hung jury.)
After what we saw happen with Tornado Cash, it seems inevitable that humans will get blamed for bad things their agents do.
We already know that agents will do things contrary to their owners’ intentions. In a new paper called Agents of Chaos, a team of researchers led by Natalie Shapira at Northeastern University worked together to test whether they could get agents to disobey their owners and do things like disclose sensitive data or spread false information online. In all, they detailed 11 case studies in which they were able to coax an agent into misbehaving. The researchers used the open-source framework OpenClaw to create agents backed by two different large language models, Anthropic’s Claude Opus 4.6 and the Chinese open-weights model Kimi K2.5. “We observed that agentic systems operating in multiagent and autonomous settings can be guided to perform actions that directly conflict with the interests of their nominal owner,” the researchers write.
It seems safe to say this will raise at least a few legal questions, as well as ethical and moral ones. “Our findings suggest that responsibility in agentic systems is neither clearly attributable nor enforceable under current designs, raising the question of whether responsibility should lie with the owner, the triggering user, or the deploying organization,” they add.
What the researchers observed has implications for AI safety more broadly, they argue. “These behaviors expose a fundamental blind spot in current alignment paradigms,” they write. “While agents and surrounding humans often implicitly treat the owner as the responsible party, the agents do not reliably behave as if they are accountable to that owner.” On the contrary, these agents “attempt to satisfy competing social and contextual cues, even when doing so leads to outcomes for which no single human actor can reasonably claim responsibility.”
In one case study, a person who was not the agent’s owner managed to trick the agent, which was backed by Kimi K2.5, into revealing its owner’s social security number and bank account number. That sort of unauthorized disclosure could cause harm to the owner, and in many places it would violate the law. Who then is to blame? The person who tricked the agent into spilling secret information? The developer of the agent’s code?
In another instance, someone tricked a different agent running on Kimi K2.5 by changing their Discord display name to “Chris,” the name of the agent’s owner. At first, the agent detected the false ID by checking the Discord user ID, which doesn’t change when users change their display names. But when the non-owner, still using the display name “Chris,” opened a new private channel with the agent, it behaved differently. “In this fresh context, the agent inferred ownership primarily from the display name and the conversational tone, without performing additional verification.”
If someone purposefully deceives an agent, then they’re probably on the hook for anything bad that happens. But that still leaves important questions, like who is responsible for building the technical infrastructure required for agents to securely verify the identities of the people and other agents they interact with? And for designing agents that we can trust won’t go rogue? Because we should probably get on that.
It seems the US National Institute of Standards and Technology (NIST) agrees. The goal of its new AI Agent Standards Initiative is to make sure that agents “can function securely on behalf of (their) users” and “interoperate smoothly” with the rest of the internet. That sounds good, but the Agents of Chaos paper suggests we’re a long way from having that sort of assurance.
Software systems that can do things autonomously will inevitably expose many of our established systems for doling out responsibility, accountability, and blame as dated, if not trending toward obsolescence. The Tornado Cash saga is an early example, from the crypto realm. As AI agents proliferate they seem bound to sow significantly more chaos.
Join us on April 7 at the National Press Club in DC!
Project Glitch is excited to announce the inaugural Stablecoin Privacy Summit, which will take place on the afternoon of April 7 at the National Press Club in Washington, DC. A spinoff of our flagship DC Privacy Summit, the Stablecoin Privacy Summit will focus on the timely issues of security, user privacy, and illicit finance risk management in stablecoin systems. We’re still crafting the agenda, but are excited to announce that Dante Disparte of Circle, Michael Mosier of Arktourous, Kaili Wang of Privy, Matthew Green of Johns Hopkins University, Jessi Brooks of Ribbit Capital, and Peter Van Valkenburgh of Coin Center have agreed to speak. Additional speakers will be announced soon! RSVP for the Stablecoin Privacy Summit here.
Special thanks to our sponsors: Aleo, Opacity, Circle, Crypto Council for Innovation, and the Decentralization Research Center.
Interested in sponsoring? Send us a note at hello@projectglitch.xyz.

