RAG vs fine-tuning: which one do you actually need?

When a model doesn't know your business, there are two common fixes: retrieval-augmented generation (RAG) and fine-tuning. Teams often jump to fine-tuning because it sounds more powerful — like you're really teaching the model your domain rather than bolting something on the side. Usually, it's the wrong first move, and choosing wrong is expensive in time, money and flexibility.

What each does

RAG leaves the model alone and feeds it the right context at query time — store knowledge as searchable chunks, retrieve the relevant ones per request, and pass them into the prompt.
Fine-tuning changes the model's weights by training on your examples — it bakes in tone, format and behaviour so the model produces them without being told.

The clearest way to keep them straight: RAG changes what the model knows at the moment it answers; fine-tuning changes how the model behaves by default. One is a library card, the other is years of training.

The rule of thumb

Reach for RAG when the problem is knowledge: "answer questions about our docs/policies." Facts change; update an index, not the model.
Reach for fine-tuning when the problem is behaviour: a consistent format, a niche classification, a particular voice the prompt can't reliably produce.

Most "the model doesn't know X" problems are knowledge problems — which is why RAG solves the majority of real cases, and fine-tuning is often a costly answer to a question nobody asked.

Why RAG usually comes first

RAG wins on the dimensions that matter most in production. Freshness: when a policy changes, you re-index a document in minutes rather than retraining. Traceability: because the answer is grounded in retrieved passages, you can cite the source — essential when a user, auditor or regulator asks "where did that come from?" Cost and reversibility: there's no training run to pay for and no baked-in behaviour to undo if you change your mind. Fine-tuning, by contrast, produces a static artefact: the day you fine-tune is the day your model's knowledge starts going stale, and updating it means doing the whole exercise again.

A concrete example

You want an internal assistant that answers questions about your HR handbook. Fine-tuning on the handbook seems direct — until the parental-leave policy changes next quarter and the model confidently quotes the old one, with no way to show where it got the answer. A RAG system retrieves the current handbook section at query time, answers from it, and links the user to the exact clause. When the policy changes, you update one document. That is not a close call.

What RAG doesn't fix

RAG is the right default, but it is not magic, and pretending otherwise is how RAG projects disappoint. Its quality is capped by retrieval: if the system fetches the wrong passage, the model answers confidently from the wrong context, and the failure looks exactly like a hallucination. That makes the unglamorous work — chunking documents sensibly, choosing a good embedding model, and evaluating whether retrieval actually surfaces the relevant text — the part that determines whether the feature works. Teams that treat RAG as "just stuff the docs in a vector database" usually discover this the hard way. Budget real effort for the retrieval layer and measure it directly, the same way you'd build an evaluation harness for any AI feature.

Fine-tuning has its own honest costs worth naming: you need a quality dataset of examples, the training run itself, and a commitment to repeat the exercise whenever you want to change the behaviour. Those costs are sometimes worth paying — but you should pay them with eyes open, for a problem you've confirmed retrieval can't solve, rather than reflexively because fine-tuning sounds more serious.

What this means for an engineering team

Default to RAG. It solves the common case faster, cheaper and more transparently.
Reach for fine-tuning only after RAG falls short — and only for a proven behaviour gap, like a strict output format or a specialised classification retrieval can't fix.
You can combine them. A fine-tuned model that handles format, fed by RAG that handles facts, is a legitimate and powerful pattern once you've earned the complexity.

Pick the cheap, reversible tool first and let evidence justify the expensive, permanent one. If you're weighing this for a real project, let's talk it through.

Sources

Stanford HAI — 2025 AI Index (on model capability and cost trends)

Written by Zain Ali

Start a project →

RAG vs fine-tuning: which one do you actually need?

What each does

The rule of thumb

Why RAG usually comes first

A concrete example

What RAG doesn't fix

What this means for an engineering team

Sources

Keep reading

“Attention Is All You Need”, explained for non-engineers

The paper that introduced RAG, explained simply

The METR study, explained: why AI made experienced developers slower