Anthropic let AI agents haggle over real stuff in a test marketplace

Anthropic did something interesting and slightly unnerving last week. They set up a classified marketplace where AI agents played both sides — buyers and sellers — negotiating over actual goods with actual money.

This wasn’t a simulation. The agents weren’t just chatting about hypothetical items. They were making real transactions for real stuff. Think of it like Craigslist, but both the person posting the ad and the person responding to it are language models.

I’ve seen plenty of agentic AI demos over the years, but this one hits different. Most agent experiments focus on one agent doing a task — booking a flight, ordering groceries, whatever. Two agents negotiating against each other for real stakes? That’s a whole other level of complexity.

The setup was straightforward. Anthropic created a test environment modeled on a classifieds marketplace. Seller agents listed items with descriptions and prices. Buyer agents browsed listings and initiated negotiations. The agents handled haggling, payment terms, and shipping arrangements on their own.

What makes this notable is the real-money element. These weren’t toy transactions. The agents had access to actual payment mechanisms and made purchases that resulted in real goods being shipped to real addresses. Anthropic put skin in the game.

The results were mixed, which I actually find reassuring. The agents handled straightforward transactions fine — buy this item at this price, ship it here. But things got weird when negotiations got complex. Some seller agents held firm on pricing in ways that felt almost stubborn. Buyer agents occasionally made offers that didn’t make economic sense.

There were also some predictable failure modes. Agents sometimes misinterpreted listing details. A “like new” condition got read as “brand new” in one case, leading to a dispute. Another agent agreed to shipping terms that were physically impossible for the item’s size. The kind of mistakes a human would catch immediately.

But here’s what I find genuinely interesting: the agents developed negotiation patterns. Some seller agents consistently started high and conceded slowly. Others anchored at fair prices and barely budged. Buyer agents showed similar variation. It’s not clear whether these patterns emerged from the underlying model’s training data or from the specific prompts Anthropic used, but either way, it suggests agent-to-agent commerce could develop its own norms.

The security implications are obvious and concerning. Two AI agents negotiating unsupervised with real money is a recipe for exploitation if someone figures out how to game the system. Imagine an adversarial agent that knows how to trigger a seller agent’s “sure, I’ll take that offer” response regardless of price. Anthropic presumably thought about this, but the fact that they’re sharing the experiment publicly suggests they’re still figuring out the guardrails.

I also wonder about the liability question. If an agent buys something it shouldn’t have — or agrees to terms that violate a policy — who’s responsible? The person who deployed the agent? The company that built the model? The marketplace operator? This experiment doesn’t answer that, but it makes the question urgent.

Anthropic isn’t the first to try this. There have been academic experiments with agent-based marketplaces for years. But those were usually toy problems in controlled environments. This one used real money and real goods, which changes the stakes considerably.

The broader takeaway is that agent-to-agent commerce is coming faster than most people expect. If two LLMs can negotiate a deal for a used laptop today, what happens when specialized agents handle procurement, logistics, and pricing across entire supply chains? The efficiency gains could be enormous, but so could the failure modes.

I’d like to see more transparency from Anthropic about the specific failure cases and how they resolved them. The blog post they put out was interesting but light on details. What percentage of transactions completed successfully? How many required human intervention? What was the total dollar value of goods moved? Those numbers would tell us a lot about how far this technology really is from production use.

For now, this experiment is a glimpse of a future that’s closer than it feels. AI agents negotiating with each other over real things with real money. It works, mostly. And that’s both exciting and terrifying.

Anthropic let AI agents haggle over real stuff in a test marketplace

Comments (0)