Few AI debates are as tribal — and as practical to resolve — as open-weight versus closed (API) models. The online version of the argument tends to be about ideology, openness and who gets to control the future. The version that matters for a team shipping a product is much more boring and much more answerable. With capability gaps narrowing and inference costs collapsing, it's a build decision, not a belief.
First, the terms. A closed model is one you reach over an API — a vendor hosts it, you send requests, you never see the weights. An open-weight model is one whose parameters you can download and run yourself, on your own hardware or a cloud you control. (Note: "open weight" isn't always "open source" — you get the model, not always the training data or licence to do anything you like with it.)
Closed (API) models
- Pros: frontier capability out of the box, zero ops, fast to start, and constant upgrades you get for free as the vendor improves the model.
- Cons: a per-token cost that never goes away, your data leaving your boundary on every call, vendor and pricing risk, and behaviour that can change underneath you when the provider ships a new version.
Open-weight models
- Pros: run on your own infrastructure or even on-device, full data control, no per-call tax once the hardware is paid for, and the ability to pin a version that never changes without your say-so.
- Cons: you now own the hosting, scaling, monitoring and tuning; reaching frontier-level quality needs real infrastructure and real expertise.
Why this matters
The decision compounds. Per-token API costs are trivial in a pilot and can become one of your largest line items once a feature succeeds and traffic grows — the same trap covered in the AI energy bill. Data sensitivity can turn an easy API call into a compliance problem the moment regulated or confidential information is involved. And vendor lock-in quietly removes your leverage: if your whole product is wired to one provider's quirks, a price change or a behaviour change becomes your emergency, not theirs.
The honest answer is usually both: a closed frontier model for the hardest, lowest-volume requests, and a smaller open model self-hosted for the high-volume or sensitive ones. (Often the open one can be a small language model that's more than good enough for the narrow task.)
How to actually decide
Decide on three axes — not on which camp you belong to:
- Control — do you need to pin behaviour, run offline, or keep the model from changing under you?
- Cost at your volume — model the bill at projected scale, not at pilot scale, where everything looks cheap.
- Data sensitivity — does the input contain anything that can't cross your boundary?
The single most important move, whichever way you lean, is to abstract the model behind your own interface. If your code talks to a thin internal layer rather than directly to a vendor SDK, switching models — or running an open and a closed one side by side — stays a configuration change, not a rewrite. That keeps the decision reversible, which matters because the landscape is moving fast: today's clear winner may be next quarter's expensive mistake. Treat it as an engineering trade-off you can revisit, not a flag you plant once.
Sources
- Stanford HAI — 2025 AI Index