Why does most commerce AI never reach production?

The model is rarely the problem. MIT's 2025 research found the failure is approach and operating discipline: pilots stall on messy data, no clear owner, and no path from demo to a live workflow. In commerce specifically, product and inventory data is scattered across D2C, marketplaces and stores, so an agent has nothing trustworthy to act on.

What does 'shipped to production' actually mean here?

An agent running live against a real workflow and a named metric, inside your stack, with humans-in-the-loop until it earns autonomy. Not a slide, not a sandbox demo, not a pilot that quietly expires. Something that runs, that we then operate.

How is this different from a consultancy or systems integrator?

We don't hand over a roadmap and leave. We diagnose the bottleneck, ship one agent to clear it, and run it: weekly tuning, monthly evals, on-call. And we bill on outcomes we both measure, not on seats or hours.

What is 'agent washing'?

Vendors rebranding chatbots and RPA as 'AI agents' without the capability to ship them into production. Gartner estimates only a small fraction of the thousands of agentic-AI vendors are genuine. The question was never whether to bring in an operator, it's which one actually ships.

How does the engagement work?

Read, then Sprint, then Operate. The Read is a 30-minute look at your stack, data and one stalled workflow, with an honest verdict on whether there's a job worth doing. The Sprint ships one agent into production against a named metric. Operate is us running it.

What do you build on?

Agents built on Claude, for reasoning over the fragmented logic of real commerce systems, orchestrated with best-in-class platforms rather than rebuilt from scratch. We disclose every commercial relationship up front, and you buy one outcome owned end to end.

beyond.

Book the call

We ship commerce AI

You bought AI. You haven’t shipped it.

Most commerce AI never reaches production. We’re the operator who gets it there, and runs it. Across catalog, fulfilment and storefront. Billed on outcomes, not slideware.

Book a 30-minute production readAgents built on Claude

What a human sees

Footwear / Trail

Trail Runner Pro

£129.00£149.00

4.8 · 312 reviews

What an agent reads

GET /product/trail-runner-pro200 OK

{
  "name": "Trail Runner Pro" ✓ OK
  "price": null ✕ N/A
  "gtin": null ✕ MISSING
  "availability": null ✕ MISSING
  "schema": none ✕ MISSING
  "agentic_checkout": false ✕ BLOCKED
  "feed_last_updated": "31 days ago" ! STALE
}

Same product page · two completely different readings

01The shipping gap

Enterprises poured tens of billions into generative AI.

95% saw zero return.

Not low. Zero.MIT, 2025

The cause isn’t the model. MIT found the failure is approach and operating discipline, and that buying from specialists reaches production about twice as often as building in-house. Gartner now tells CIOs the same thing: buy, don’t build.

Built in-house33%

Bought from specialists67%

Reach production · MIT, 2025

A demo is the floor. Production is the moat.

02The catch

So you need an operator. Most aren’t real.

The data says bring in a specialist, not an internal build. Here’s the problem: most specialists aren’t one. The rest are agent washing: rebranded chatbots and RPA with a new logo.

The question was never whether to bring in an operator. It’s which one actually ships.

03Why commerce, why us

An agent is only as good as the data it can read.

Retail leads on AI adoption and lags on production Industry reports. The blocker everyone names is the same: an agent is only as good as the product and inventory data it can read, and in commerce that data is scattered across D2C, marketplaces and stores. That’s not a model problem. It’s an operator problem.

It’s the one I’ve worked for twelve years (SAP, Mirakl, ChannelAdvisor, Adobe, Uberall), and the one we now solve with agents running inside our own ventures. You get the operator, not the org chart.

04What we ship

Agents in production, not slides.

Each built into your stack, shipped to a named metric, then operated. Ordered by return: back-office and operations first, where the measurable value lands. Pick the bottleneck that’s costing you now.

Catalog & product-data agents

A multi-brand retailer with tens of thousands of SKUs across D2C, marketplace and store systems. The agent reads, standardises and enriches product data so it is consistent and machine-readable everywhere.

Movesenrichment turnaround and AI-surface visibility

Order & fulfilment ops

A retailer whose stock and orders live in systems that don’t reconcile. The agent routes orders and keeps inventory truthful across channels in real time.

Movesoversell and cancellation rate, manual reconciliation hours

CX concierge

A team buried in “where’s my order” tickets. The agent resolves and closes first-line queries in production; humans keep the judgement calls.

Movesfirst-contact resolution and response time

Transactable storefronts

A storefront an AI shopping agent can’t buy from. We make price, stock, delivery and checkout machine-readable and completable.

Movesagent-completable checkout and AI-sourced conversion

Research & ops agents

The repeatable between-systems work: reconciliation, briefings, exceptions. The agent takes the load; humans keep judgement.

Moveshours returned and error rate

One execution layer we operate and stand behind. Orchestrating best-in-class platforms, never the same thing sold twice.

05How it works

Read → Sprint → Operate.

We don’t hand you a roadmap and leave. We diagnose the bottleneck, ship one agent to clear it, and run it.

01Read

30 minutes on your stack, data and one stalled workflow. An honest verdict on whether there’s a job worth doing.

02Sprint

Fixed scope. One agent, into your stack, into production, against a named metric. Humans-in-the-loop until it earns autonomy.

03Operate

We run it. Weekly tuning, monthly evals, on-call. Billed on outcomes we both measure.

06Proof you can check

We don’t ask you to take it on faith.

We read your store the way an AI buyer does and show you exactly where it breaks: price, schema, stock, checkout. The same read we run across the market.

Live readiness data · June 2026

We read 32 UK retailers the way an AI buyer does.

An agent reads your structured data, not your homepage, to decide what to recommend and buy. Here is what it couldn’t read.

No global product identifier (GTIN)22/32

the agent can't confirm it's the same product, so it recommends a seller it can match.

Product data that renders only after JavaScript9/32

an agent reading the page without running JS sees no product at all.

No machine-readable price3/32

to an AI shopper the item has no price, and drops out of the comparison.

A product schema with an empty price field3/32

it looks ready to an eye, but reads as priceless to a crawler.

Every gap is a sale quietly handed to a retailer the agent could read. None of it is the model's fault, and every one is fixable in weeks.

Read as a crawler sees the page, across six structured-data checks. No retailer is named.

07Why founder-led

More AI won’t fix your organisation. Throughput will.

The AI revolution showed up as a mess. Every team bought a tool. Every function ran a pilot. What you got wasn’t intelligence. It was sprawl. More logins, more invoices, more dashboards, and a business moving no faster than before.

Because output was never governed by how busy every part looks. It’s governed by one constraint: the single place the work actually chokes. It’s the oldest idea in operations, and the AI scramble forgot it entirely. You don’t speed a system up by improving everything. You find the bottleneck, and you clear it. Then the next one.

So that’s how we work. Not a transformation programme. One constraint, found. One agent, shipped to clear it, proven against a number. Then the next constraint. Motion you can measure, not motion that looks busy.

The only way to find a constraint fast is to have seen a lot of them. Twelve years inside how commerce actually flows: across global brands, across business models that share almost nothing, and now across the AI companies and workflows all scrambling to make sense of this moment. Enough systems, enough messes, to name the bottleneck before you’ve finished describing it.

You don’t need more AI in your organisation. You need the thing in the way gone.

Taha Zaheer · Beyond Partners

08Intelligence

Commerce AI intelligence, free to read.

Research-grade writing on why commerce AI stalls before production, and what it takes to ship.

All intelligence →

—See it live

We don’t pitch agentic commerce. We show you what the machines see.

We ran a live agentic-readiness teardown on a 240-year-old British heritage house — real requests from the same crawlers behind ChatGPT, Claude and Perplexity. The verdict was uncomfortable, specific and fixable. This is the work, not a slide about the work.

Read a live teardown

09Questions

The production gap, answered plainly.

What it takes to get commerce AI from pilot to production. Clear answers, no vendor theatre.

Why does most commerce AI never reach production?: The model is rarely the problem. MIT's 2025 research found the failure is approach and operating discipline: pilots stall on messy data, no clear owner, and no path from demo to a live workflow. In commerce specifically, product and inventory data is scattered across D2C, marketplaces and stores, so an agent has nothing trustworthy to act on.
What does 'shipped to production' actually mean here?: An agent running live against a real workflow and a named metric, inside your stack, with humans-in-the-loop until it earns autonomy. Not a slide, not a sandbox demo, not a pilot that quietly expires. Something that runs, that we then operate.
How is this different from a consultancy or systems integrator?: We don't hand over a roadmap and leave. We diagnose the bottleneck, ship one agent to clear it, and run it: weekly tuning, monthly evals, on-call. And we bill on outcomes we both measure, not on seats or hours.
What is 'agent washing'?: Vendors rebranding chatbots and RPA as 'AI agents' without the capability to ship them into production. Gartner estimates only a small fraction of the thousands of agentic-AI vendors are genuine. The question was never whether to bring in an operator, it's which one actually ships.
How does the engagement work?: Read, then Sprint, then Operate. The Read is a 30-minute look at your stack, data and one stalled workflow, with an honest verdict on whether there's a job worth doing. The Sprint ships one agent into production against a named metric. Operate is us running it.
What do you build on?: Agents built on Claude, for reasoning over the fragmented logic of real commerce systems, orchestrated with best-in-class platforms rather than rebuilt from scratch. We disclose every commercial relationship up front, and you buy one outcome owned end to end.

10The call

Find out what’s stopping your AI from shipping.

A 30-minute read of your stack, your data and one stalled workflow: why it hasn’t reached production, and what it would take. Then we tell you honestly if there’s a job worth doing.

Book a 30-minute production read

or hello@beyond.partners