If most organisations can't show a return on AI — only 39% report any EBIT impact, and just 4 of the top 50 banks saw realised ROI in 2025 — then measuring ROI is itself the competitive skill. That's a subtle but important reframe. The bottleneck isn't always whether AI creates value; it's whether you can prove it created value, in numbers a finance team will accept. Plenty of genuinely useful pilots die not because they failed, but because nobody could show they succeeded.
Why AI ROI is hard to see
- Benefits are often diffuse — a few minutes saved per task, spread across hundreds of people — and small, scattered savings are notoriously hard to roll up into a number on a report.
- The full cost is rarely tracked against the benefit. Teams remember the headline API bill but forget the monitoring, the prompt upkeep, the integration work and the human review time that keep the feature honest.
- Perception misleads. METR's randomised trial famously showed experienced developers who felt faster while actually being slower with AI tools. If the people doing the work can't reliably sense the effect, then "it feels like it's helping" is worthless as evidence. Vibes aren't ROI.
Why this matters
This connects directly to why 95% of AI pilots fail: a pilot with no baseline and no measured target can't be defended when budgets tighten. The team that can walk into a review and say "this workflow cost X before, costs Y now, here's the running cost, here's the net" survives the cut. The team waving a demo and a good feeling does not. Measurement isn't bureaucracy here — it's the thing that keeps a working feature alive.
A practical approach
- 1.Pick one workflow with a baseline. Measure its current cost, time and quality before AI touches it. If you skip this step, you've lost the comparison forever — you can't reconstruct a baseline after the fact.
- 2.Instrument the AI version. Track its true running cost (inference, monitoring, review) and its measured output, not its perceived output.
- 3.Compare like for like, including the maintenance tail. The launch is the cheap part; prompts drift, models change, and someone has to keep it working. Count that.
- 4.Tie it to a P&L line — revenue, cost or risk. If you can't connect the feature to a number that shows up in the accounts, be honest that it's a bet, not a return.
If you can't draw a straight line from the AI feature to a number that matters, you don't have ROI — you have a hope.
A concrete example
A support team adds an AI assistant to draft replies. Before: agents handled 30 tickets a shift at a known cost per ticket. After: 42 tickets a shift, at a measurably higher quality score, against a running cost you can read off a dashboard. That's a defensible ROI — a clear before, a clear after, and the full cost subtracted. Contrast the version where the team reports "agents love it" with no numbers: identical tool, completely different fate when finance asks the hard question.
The forward-looking takeaway is that measurement compounds. The first workflow you instrument is painful; by the third, you have a repeatable method for deciding what to scale and what to kill. In a landscape where most can't show a return, that discipline is the edge. If you want help building it into a project, get in touch.
Sources
- McKinsey — The State of AI 2025
- METR — Developer productivity RCT
- Coinlaw — AI in Banking Statistics 2025