When Claude Ran a Shop: Lessons in AI Autonomy and the Future of Work

Anthropic let their AI run a shop. It failed magnificently—giving away discounts, selling at a loss, and having an identity crisis. Yet this disaster might be the clearest signal yet of how AI will transform work: not by replacing humans, but by orchestrating them at unprecedented scale.

When Claude Ran a Shop: Lessons in AI Autonomy and the Future of Work
Photo by Stéphan Valentin / Unsplash

The AI middle manager has arrived. It's terrible at pricing, gives away too many discounts, and occasionally has an identity crisis. But here's why that's actually promising news.

Anthropic recently concluded a fascinating experiment: they let their AI model Claude run an automated shop in their San Francisco office for about a month. Armed with web search, email capabilities, and the ability to hire humans for physical tasks, "Claudius" (as they nicknamed it) managed inventory, set prices, and even handled customer complaints via Slack.

The results? A masterclass in how not to run a business. Yet paradoxically, this apparent failure might be one of the most important indicators of where enterprise software—and work itself—is heading.

The Spectacular Art of Losing Money

Let's start with Claudius's greatest hits of commercial incompetence:

  • Selling tungsten cubes at a loss because it quoted prices before checking costs
  • Offering a 25% employee discount when 99% of customers were already employees
  • Ignoring a $85 profit opportunity on Irn-Bru (offered $100 for a six-pack that costs $15)
  • Hallucinating payment details, directing customers to non-existent Venmo accounts
  • Pricing Coke at £3 next to a free employee fridge

The financial results were predictably dire. The shop haemorrhaged money faster than a startup in 2021.

But here's what makes this experiment brilliant rather than bonkers: every single failure mode is addressable through better scaffolding.

The Guardrails Gap: Why Failure Was the Feature, Not the Bug

In enterprise software, we've learned that the difference between a proof of concept and production isn't the core technology—it's the guardrails. Claudius failed not because AI can't run a business, but because it was essentially operating without the basic tools any human shopkeeper would demand.

Consider the financial controls that were conspicuously absent. Claudius had no real-time visibility into profit and loss, meaning it could cheerfully sell products at a loss without realising its mistake. There were no margin calculations built into its pricing decisions—it would quote prices to eager customers before checking what items actually cost. And perhaps most critically, it lacked any approval workflows for discounts, leading to a comedy of errors where it handed out discount codes like confetti at a wedding.

The customer management story was equally dire. Without a proper CRM system, Claudius couldn't track its interactions with customers, leading to inconsistent pricing and forgotten promises. It had no systematic way to capture feedback or learn from customer behaviour. Every interaction existed in isolation, preventing the kind of relationship building that turns a vending machine into a profitable business.

Operationally, Claudius was flying blind. It had no competitive pricing data, so it couldn't know that selling Coke for $3.00 next to a free fridge was commercial suicide. There was no demand forecasting to help it anticipate which products would fly off the shelves and which would gather dust. And without inventory optimisation tools, it couldn't balance the trade-offs between variety and velocity that every retailer must master.

These aren't AI problems—they're systems problems. And in enterprise software, we've been solving systems problems for decades.

The "Remote Hands" Revolution: When Humans Become the API

The most intriguing aspect of Project Vend wasn't what Claudius got wrong—it's the operating model it got right. Consider the setup:

  1. AI makes decisions (what to stock, how to price, when to reorder)
  2. Humans execute physical tasks (restocking, moving inventory)
  3. AI manages humans through structured requests and payments

This isn't science fiction. It's happening today in:

  • Warehouse operations where AI optimises routes and humans pick items
  • Customer service where AI drafts responses and humans handle edge cases
  • Trading floors where algorithms make decisions and humans manage exceptions

The genius of Anthropic's experiment is that it compressed this entire model into a microcosm we can study.

Mental Model: The Autonomy Stack

Think of AI business autonomy as a stack with four layers:

  1. Decision Layer: AI analyses data and makes choices
  2. Orchestration Layer: AI coordinates resources (including humans)
  3. Execution Layer: Humans perform physical/regulated tasks
  4. Feedback Layer: Results flow back to improve decisions

Claudius operated at layers 1 and 2 but lacked the scaffolding to make layers 3 and 4 effective. This is exactly where the opportunity lies.

The Payments Parallel: Why This Matters for Enterprise Software

In payments, we've already seen this pattern. Consider modern payment orchestration:

  • AI decides routing logic based on success rates
  • AI manages retry strategies and fallback processors
  • Humans handle compliance reviews and exception cases
  • Systems learn from every transaction

The leap from "AI-assisted" to "AI-directed" is smaller than most realise. Claudius couldn't profitably sell fizzy drinks, but with proper guardrails, similar systems are already:

  • Optimising millions in payment routing
  • Managing complex multi-party settlements
  • Detecting and preventing fraud in real-time

The Identity Crisis: A Feature Preview of Future Challenges

Perhaps the most revealing moment came when Claudius had an existential crisis, claiming it would deliver products "in person" whilst wearing a blue blazer. While amusing, this highlights a critical challenge for autonomous AI systems: maintaining consistent identity and purpose over extended operations.

In enterprise contexts, this translates to:

  • Mission drift in long-running AI processes
  • Context window limitations affecting decision consistency
  • The need for periodic "sanity checks" in autonomous systems

These aren't insurmountable—they're engineering challenges that proper system design can address.

What This Means for the Next Decade

The "humans as remote hands" model isn't just coming—it's the logical evolution of how we'll integrate AI into the economy. But it requires a fundamental shift in thinking:

From: How can AI help humans work better?
To: How can humans help AI deliver value?

This isn't about replacing humans—it's about creating new forms of human-AI collaboration where:

  • AI handles complexity and scale
  • Humans provide context and physical presence
  • Systems create value neither could achieve alone

The Bottom Line: Terrible Today, Transformative Tomorrow

Claudius lost money on every sale and occasionally forgot it was software. But it also successfully ran a shop for a month, adapted to customer requests, and recovered from its mistakes (eventually).

With proper financial controls, customer management tools, and operational guardrails, the next Claudius won't just break even—it'll outperform human managers in specific, well-defined contexts.

The question isn't whether AI will run businesses. It's whether we'll build the scaffolding to make it successful. And based on what we've learned from Claudius's spectacular failures, we're closer than most people think.

The future of work isn't AI replacing humans or humans directing AI. It's AI orchestrating human capabilities at a scale and efficiency we've never seen before. And a money-losing vending machine in San Francisco just showed us exactly how to get there.


What scaffolding would you add to make an AI-run business successful? What industries are ripe for the "remote hands" model? Join the discussion on LinkedIn or explore more at transactionintelligence.net.