How Fine-Tuning Works

Fine-tuning means continuing a model's training on your own examples so it learns a behavior you want. You start from a model that already knows language and nudge its weights toward your task. The question is not whether this works but how much of the model you change and at what cost.

The direct approach is to update every weight in the model. This is called full fine-tuning, and it gives you the most control. It also needs a lot of memory. For a model with billions of weights, that adds up to far more memory than a single consumer GPU has. Full fine-tuning is real, but it belongs on rented multi-GPU machines.

There are some fine-tuning techniques that stand out among the others. Each has its strengths and weaknesses: LoRA, QLoRA, Reinforcement Learning (RL), and others. Let's see the most popular ones.

LoRA (Low-Rank Adaptation)

LoRA is the method that made fine-tuning practical on normal hardware. The idea is to leave the original weights frozen and train a small set of new weights alongside them. Instead of rewriting the model, you add a thin layer that learns the difference between what the model does now and what you want it to do. Because this added layer is tiny compared to the full model, you train far fewer numbers, which means far less memory and far less time. The frozen base stays untouched, and the small trained part is called an adapter.

LoRA vs full fine-tuning

The trade is that you are not changing the whole model, so LoRA is best at teaching a style, a format, or a skill rather than stuffing in large amounts of new knowledge.

QLoRA (Quantized Low-Rank Adaptation)

LoRA already freezes the base model and trains only a small adapter. QLoRA adds one trick on top: it stores the frozen base in a compressed form that takes about a quarter of the memory. The base is only being read during training, never changed, so compressing it costs almost no quality. The adapter still trains the normal way. The payoff is that a model which would not otherwise fit now trains on a single small GPU, including the free ones on Colab.

QLoRA vs LoRA

This is the default way people fine-tune at home.

What You Teach: Correct Answers vs Preferences

Fine-tuning methods split by one question: does your task have a single correct answer, or only better and worse ones? That choice decides what your training examples look like.

Supervised fine-tuning, or SFT, is the one you will use. Your examples are pairs: an input and the correct output. The model learns to produce that output when it sees that kind of input. To build a model that turns plain English into SQL, you show it a question and the right query, many times, until it writes SQL on its own. SFT is how you teach a skill, a format, or a style, any task where you can write down the right answer.

Preference-based fine-tuning handles the other case, where there's no single right answer, only better and worse ones. You don't hand the model a target output. You give it a comparison, and it shifts toward the kind of answer people prefer. There are two common ways to do this:

RLHF (reinforcement learning from human feedback) trains a separate model to score responses, then uses that score to steer the main model.
DPO (direct preference optimization) skips the separate scorer and learns straight from pairs labeled "preferred" and "rejected".

Preference-based methods are how labs make a model more helpful, polite, or safe, where "correct" is a matter of judgment. They are the wrong tool when a correct answer exists, so this chapter stays with SFT.

Preference tuning needs humans to judge which responses are better and to flag harmful ones. That labeling is slow, tedious work, so labs often outsource it to lower-cost workers in the global south. The fine-tuning itself is just GPU compute; the human cost is in creating the preference data, not in running the training.

Method

Local AI Engineering with Ollama

Run, understand, customize, fine-tune, and build agentic apps on your own hardware

Enroll now to unlock all content and receive all future updates for free.

Unlock now $26.99 Learn More

Previous Next