What does it mean to decentralize AI?

Part 1: Training data is power

Nov 04, 2025

Hello again! Crypto folks talk a ton about decentralization in the technical sense. But decentralization is about power, and distributing the power to the edges—or to the people—often requires more than just technical systems. This is part of what makes the collision between crypto and AI technologies so fascinating. As journalists, we plan to engage with this collision with great care, maintaining our focus on a basic question: What does it mean to decentralize AI?

Our first major foray was a panel discussion that Michael Reilly moderated in Washington, DC, in September, featuring Art Abal of Vana and Jay Stanley of the ACLU. Today’s edition reflects on one particularly urgent idea that surfaced during that conversation. After that, we have a highlight from the DC Privacy Summit.

Who will own the training data?

By Mike Orcutt

Whether AI will be good or bad for society in the long run may hinge on how widely its power is distributed. Jay Stanley, a senior policy analyst at the American Civil Liberties Union, compares AI to weapons technologies that have changed the nature of warfare in the past: Will it be more like tanks or like muskets? Tanks are technically complicated and massively expensive. Only large, wealthy entities that command tremendous amounts of labor and industrial resources can produce them. That makes them “inherently authoritarian,” he says. By comparison, muskets, rifles, and other small arms are simple and easy to distribute widely—they’re “inherently democratic.”

At the moment, AI is more like tanks—primarily built and deployed by companies with vast amounts of capital and infrastructure at their disposal. As Stanley noted, however, the rise of models like DeepSeek R1 and its later iterations shows that cheaper, open-source models have the potential to change the playing field.

One factor that will weigh heavily on how things play out is who owns the data that these models use to train on.

“The power in AI is actually data now,” Art Abal, managing director of the Vana Foundation, said at Project Glitch’s session of PGP* for Crypto, a gathering of crypto policy insiders in Washington, DC. Abal joined remotely from Australia to chat with Stanley and Project Glitch editor Michael Reilly about what decentralization can do for AI.

From left to right: Art Abal (appearing remotely), Michael Reilly, and Jay Stanley

“We’ve reached what’s called a data wall. All of the public data has already gone into a lot of these (large language models),” Abal said. Now all the valuable data lives in what he called “private data silos” owned by technology companies. “That’s essentially what we’re trying to break up in creating a decentralized protocol for data,” he said.

The company Abal cofounded, Vana, has devised a system that lets users privately store personal data—say, for example, all their Reddit data, which can be acquired by submitting a form—and control access to it using rules encoded in blockchain smart contracts. Vana has also created a mechanism by which users can pool their data to create valuable datasets they collectively own. There are 15 such “data collectives” listed on the company’s website. The most high-profile example is probably RedditDAO, which promises to let you “own your Reddit history.”

Stanley expressed skepticism about the business of helping people make money from their personal data because he’s seen so many failed attempts over the years, starting long before blockchains emerged. And it’s not clear how much room there is to improve large language models (LLMs) with additional data, at least “compared to the huge leap that we achieved by training LLMs on the whole Internet,” he argued.

But it’s not just about LLMs, Abal said. “We’ve also got to remember that the majority of the AI we interact with every day is not just LLMs,” he said. “We interact with it whenever we get suggested things, we interact with it whenever we go to the doctor’s, we interact with it in all sorts of facets of life.” That’s why data—and especially “niche data” like data on specific kinds of human interactions with AI—is so valuable, he said.

Indeed, the power that companies like OpenAI, Anthropic, Meta, Google, and others have in this realm—not just to build better LLMs but also other kinds of models—is increasing as they collect more and more of this data, he said. The quality of public datasets is improving, but “the majority of data that creates that edge in terms of AI is exchanged in these private backdoor deals or is trained on data that the platforms collect themselves.” This helps explain why Anthropic, which had previously said it would not rely on user data for model training, recently changed its policy and will now use that data by default, Abal said.

For many good reasons, not everyone will want to hand over their data. “There’s an enormous need for people to use models that have privacy, either personal privacy or just organizations that can’t share their documents,” said Stanley.

Users should also have the power to pack their data up and move from one platform to another, Abal argued. Without this kind of “data portability,” a “flywheel effect” will just keep increasing big AI companies’ power, he said, at the expense of users.

Take ChatGPT for example, he said: “It trains on your data, unless you opt out, so the more data you put in, the better the model gets, which increases your incentives to use that same model, which then increases the amount of data that you put in that model, which then makes that model better and increases your incentives to use that model.”

PRIVACY SUMMIT INSIGHT

Zero-knowledge credentials are just step one

Many compelling themes emerged from the conversations at Project Glitch’s DC Privacy Summit last month. One that stood out was how much conceptual and technical work is still to come before a truly secure alternative to today’s anti-money laundering (AML) systems will be ready for prime time.

There are good reasons to want to supplant the current AML regime:

It calls on financial institutions to collect and store vast amounts of personal data—a costly and arguably outdated practice that leaves regular people’s sensitive personal information vulnerable to hackers.
The current system also doesn’t work for decentralized software, where there is no human in the middle to verify customer IDs and watch for suspicious transactions.
We have powerful new cryptographic capabilities that could serve as components of these new AML systems.

Many of the Privacy Summit sessions touched on the idea that zero-knowledge cryptography could be used to anonymously prove statements about a user’s identity, from their national citizenship to the fact that they are older than 18 or 21 years. These tools are ready today; Google already is using them. But anti-money laundering systems require more than just static checks, noted Zcash co-inventor and University of Maryland cryptographer Ian Miers in one of the most insightful moments of an insight-packed day:

Watch the full panel discussion, which featured Miers as well as Ross Schulman of SpruceID and Laz Pieper of the DeFi Education Fund, here. And here’s a playlist of every session from the day.