Why 95% of enterprise AI pilots fail — and what the 5% do differently

In August 2025, MIT's NANDA initiative published *The GenAI Divide: State of AI in Business 2025*, and one number dominated coverage: 95% of enterprise GenAI pilots deliver little to no measurable return. Only around 5% achieve rapid revenue acceleration. It's the kind of statistic that gets read two ways — as proof the technology is overhyped, or as proof most companies are using it badly. The report's own data points firmly at the second reading.

It's worth taking seriously because of how it was built: 150 executive interviews, a 350-employee survey, and analysis of 300 public AI deployments. That's not a vendor whitepaper or a single anecdote scaled into a trend — it's a broad look at what actually happens after the press release.

It's not the models

Asked why their pilots stalled, executives mostly blamed regulation and model performance. MIT's data pointed somewhere far less convenient: a "learning gap." The tools didn't adapt to how people actually worked, and the organisations didn't redesign their workflows around the tools. The model was rarely the bottleneck. The integration, the ownership and the willingness to change a process were.

This matters because it's a fixable diagnosis. If the problem were "models aren't good enough," you'd be stuck waiting for the next release. If the problem is "we bolted a chatbot onto an unchanged process and hoped," that's within your control today.

What the 5% do

1.Buy more than they build — vendor partnerships succeed roughly 67% of the time; internal builds about one-third as often. That's not a blanket case against building, but it is a warning that building is where most teams underestimate the cost. (We dig into the trade-off in build vs buy vs AI.)
2.Push ownership to line managers, not just a central AI lab. The people who feel the pain of a broken workflow are the ones who make adoption stick.
3.Choose tools that integrate deeply into existing systems and improve over time, rather than novelties that sit beside the real work.

The divide isn't good AI vs bad AI. It's companies that changed how they work vs companies that just bought a tool.

A concrete pattern

The failing pilot has a recognisable shape: a central team picks an impressive tool, runs a demo that wows leadership, deploys it broadly, and then watches usage decay over a few months as people quietly return to the old way because the new way never quite fit. No owner, no baseline, no measured target — so when it doesn't obviously help, nobody can say whether it failed or just wasn't measured.

What this means for your team

If your initiative is stalling, the fix is rarely a better model:

Narrow the scope to one workflow with a clear cost today.
Name a single owner who feels the outcome.
Set a measurable target before launch — and actually measure it. (See measuring AI ROI.)
Integrate deeply enough that the AI path is easier than the old path, not a detour.

The forward-looking note is almost optimistic. The 95% figure isn't a ceiling on the technology — it's a snapshot of an adoption discipline that most organisations haven't built yet. The companies that learn to scope, own and measure their AI work aren't waiting on a smarter model. They're already in the 5%.

Sources

MIT NANDA — The GenAI Divide
McKinsey — The State of AI 2025

Written by Zain Ali

Start a project →

Why 95% of enterprise AI pilots fail — and what the 5% do differently

It's not the models

What the 5% do

A concrete pattern

What this means for your team

Sources

Keep reading

Chinchilla and the scaling laws: why bigger models aren’t always better

The state of enterprise AI in 2025: what the reports actually say

Chain-of-thought: the paper that taught models to “show their work”