The agentic AI reality check: Gartner’s numbers cut both ways

"Agentic AI" — systems that take multi-step actions, not just answer — is the dominant theme of the moment. Gartner's forecasts capture both the hype and the hangover. The interesting part isn't either number on its own — it's that the same analyst house is publishing explosive-growth and high-failure forecasts at the same time. Both are true, and understanding why is the whole game.

First, a definition, because "agent" has been stretched to mean almost anything. An agent is a system that plans and executes a sequence of steps toward a goal, often calling tools or other systems along the way — booking the meeting, not just drafting the email; reconciling the invoice, not just reading it. The leap from "answers questions" to "takes actions" is exactly where the value and the risk both live.

The growth case

40% of enterprise apps will feature task-specific AI agents by the end of 2026 — up from under 5% in 2025.
By 2028, 33% of enterprise software will include agentic AI (from <1% in 2024), and 15% of day-to-day work decisions will be made autonomously.
Agentic AI could drive ~$450 billion in software revenue by 2035.

The reality check

Over 40% of agentic AI projects will be cancelled by the end of 2027, Gartner predicts — due to escalating costs, unclear value and weak risk controls.
Only 17% of organisations have actually deployed agents today, though 60%+ expect to within two years.

Agents amplify everything — including your gaps. An unreliable workflow doesn't get better when you let it act on its own; it gets faster at being wrong.

Why the failure rate is so high

The cancellation number isn't a verdict on the technology — it's a verdict on how the technology gets adopted. Three causes recur. The first is compounding error: chain five steps that are each 90% reliable and the end-to-end success rate is only about 59%. Autonomy multiplies small unreliabilities into big ones. The second is unbounded cost: an agent that can loop, retry and re-plan can quietly burn a fortune in tokens before anyone notices. The third is weak risk controls — giving a probabilistic system the ability to act on real systems without guardrails turns a hallucination into an incident.

Picture a procurement agent meant to handle routine reordering. In a demo it's magic. In production it misreads a supplier's price field, orders ten times the intended quantity, and — because no human sat in the loop for "irreversible spend" — the purchase order is already out the door. That's not a model-quality problem; it's a design problem. The model did roughly what models do. The system around it failed to contain the consequences.

Holding both forecasts at once

The reason the growth forecast and the cancellation forecast aren't contradictory is that they describe different things. The growth number describes demand — every vendor is shipping agent features, every board is asking about them, so the count of "apps with an agent" climbs steeply almost regardless of whether those agents work. The cancellation number describes outcomes — of the projects that get seriously attempted, a large share won't survive contact with cost, reliability and risk reviews. A surge in attempts and a high failure rate among them can, and clearly will, happen simultaneously. The strategic question for any given team is which side of that split they intend to be on, and that's mostly determined before a line of code is written — by scope and guardrails, not by model choice.

It's also worth resisting the framing that 2026 is a deadline you must hit. Being early to a pattern with a 40% cancellation rate is not obviously an advantage. The teams that wait for a genuinely valuable, bounded use case — and instrument it properly — will frequently beat the teams that rushed an autonomous agent into production to look modern.

What this means for your team

The teams that succeed start narrow, instrument heavily, and keep a human approving anything irreversible. The ones that cancel tried to make an agent autonomous before it was even reliable. Concretely:

Pick one bounded, valuable task rather than a general-purpose autonomous worker.
Constrain tools and permissions to the minimum the task needs — least privilege, not convenience.
Keep a human approving anything irreversible — money moving, data deleting, messages sending externally.
Instrument every step — inputs, outputs, cost, latency and success — so you can see drift before it becomes an outage.

A useful gut-check before any agent project: ask what happens on the worst day, not the demo day. If the answer to "what if it does the wrong thing five steps in?" is "a human catches it before anything irreversible happens," you've designed for reality. If the answer is "it shouldn't do that," you haven't — you've designed for the demo, and you're a strong candidate for the 40%. The teams that ship durable agents treat unreliability as a given to be contained, not a bug to be eliminated before launch.

If you're weighing where an agent earns its keep versus where plain deterministic code is safer and cheaper, our team can help you scope it before you build — and our piece on why AI agents fail in production goes deeper on the failure modes above.

Sources

Gartner — 40% of enterprise apps will feature AI agents by 2026
Gartner — Over 40% of agentic AI projects cancelled by 2027

Written by Zain Ali

Start a project →

The agentic AI reality check: Gartner’s numbers cut both ways

The growth case

The reality check

Why the failure rate is so high

Holding both forecasts at once

What this means for your team

Sources

Keep reading

Chinchilla and the scaling laws: why bigger models aren’t always better

Chain-of-thought: the paper that taught models to “show their work”

The AI cost curve: cheap to start, expensive to keep