What Ollama Is, and What It Is Not

This is the part that confuses newcomers, so it is worth pinning down at the start of this book. Local AI is not one tool, it is a layered stack, and Ollama only occupies one of those layers.

The following table shows these different layers, examples of what they do, and where they fit in the stack.

Layer	What it does	Examples
Model weights	What the model learned, stored as a file	Llama, Qwen, Gemma, Mistral (distributed as GGUF, safetensors, etc.)
Inference engine	Does the actual math on CPU or GPU	llama.cpp, MLX
Runtime / server	Manages models, exposes APIs, handles loading and unloading	Ollama, LM Studio, llama-server, vLLM
Client / interface

Local AI Engineering with Ollama

Run, understand, customize, fine-tune, and build agentic apps on your own hardware

Enroll now to unlock all content and receive all future updates for free.

Unlock now $26.99 Learn More

Previous Next