There's a seductive promise in AI-first development: ship in an afternoon what used to take a quarter. The data says it's half-true — and the missing half is where budgets die.
Starting has never been cheaper
Per Stanford's 2025 AI Index, GPT-3.5-level inference fell from $20 to $0.07 per million tokens between late 2022 and late 2024 — a 280-fold drop in 18 months. Prototyping is genuinely cheap. A capability that would have been an absurd line item two years ago is now rounding-error money to try. That's a real and underrated shift — it means the cost of experimenting has collapsed, and you should experiment freely.
The trap is mistaking the cost of the experiment for the cost of the product. They are not the same number, and the gap between them is where most AI budgets quietly go wrong.
Keeping it is where the bill arrives
- Most pilots never pay off: MIT found 95% deliver no P&L impact; McKinsey found only 39% see any EBIT impact, mostly under 5%.
- Inference scales with usage forever, and the IEA expects AI data-centre power to more than quadruple by 2030.
- Models drift, prompts rot, evals need upkeep — none of it is one-time.
Traditional development front-loads the pain: high upfront cost, low tail. AI-first development inverts it — cheap to start, and the meter never stops.
The shape of the cost curve is the whole point. Traditional software is mostly a build cost: you pay engineers to write it, then it runs nearly for free. AI-first software flips that. The build is cheap; the operation is where the money lives — every request costs tokens, every model upgrade risks a regression you have to re-test, every prompt slowly rots as the world it describes changes underneath it. None of those are one-time costs, and almost none of them show up in the proof-of-concept that got the project funded.
A concrete example
Say you ship an AI feature that classifies and routes incoming support tickets. The prototype takes two days and costs almost nothing to run on test data. Then real traffic arrives: tens of thousands of tickets a day, each a model call, multiplied by retries and the occasional re-classification. Now add the work nobody costed — monitoring quality, updating the prompt when a new product line launches, re-validating when the vendor ships a new model version, and paying an engineer to own all of it. The two-day build now carries a permanent monthly bill. That bill is fine if you measured the value it produces — and a slow leak if you didn't.
The hidden tail nobody costed
Beyond the obvious per-token spend, three costs reliably surprise teams. Drift is the first: a prompt that performed well at launch slowly degrades as the world it describes changes — new products, new edge cases, new phrasing from users — and quality erodes without any code changing. Vendor churn is the second: model providers deprecate versions, change pricing, and ship "upgrades" that can quietly regress your particular task, each of which forces re-testing you didn't schedule. The third is the energy floor. Inference isn't just a financial cost; it's a physical one. The IEA expects AI data-centre power to more than quadruple by 2030, and at production volume that translates directly into a cost that scales with every request you serve, forever. Efficiency stops being an environmental nicety and becomes a line-item discipline — the architecture that uses fewer, smaller, cached calls is also the one with the smaller bill.
This is why "cheap to start" is such a dangerous phrase. The starting cost is the one number that's genuinely fallen 280-fold. Almost every other cost in an AI feature is recurring, scales with usage, and was invisible in the prototype that won the budget.
The pattern that works
- 1.AI at the edges, deterministic code at the core. Let plain code do anything that can be specified exactly; reserve the model for the genuinely fuzzy parts.
- 2.Build the [eval harness](/blog/eval-harness-for-llm-features) before you scale. You can't manage a recurring quality cost you can't measure.
- 3.Abstract the vendor so swapping models is config, not a rewrite — pricing and quality both move, and you want to follow them cheaply.
- 4.Budget the tail, not just the launch. Forecast inference, monitoring and upkeep as ongoing line items from day one.
What this means for your team
Treat AI features the way you'd treat hiring, not the way you'd treat buying a tool: a recurring commitment with a running cost, justified by a measurable return. Before you scale anything, know its per-request cost, its volume, and the workflow value it produces — so you can compare them honestly. The teams that win here aren't the ones who started cheapest; they're the ones who budgeted for the meter that never stops. If you want a second pair of eyes on the real total cost of an AI feature before you commit, that's a good conversation to have with our team.
Sources
- Stanford HAI — 2025 AI Index
- MIT NANDA — The GenAI Divide
- IEA — Energy and AI