Feedback

Chat Icon

Local AI Engineering with Ollama

Run, understand, customize, fine-tune, and build agentic apps on your own hardware

Running Models and Understanding How They Work inside Ollama
32%

Ollama Conversation Flow

When you chat with a model using a client like ollama run $MODEL, the API, or a web interface like Open WebUI, here's what actually happens end to end in 7 steps:

  1. Your client (CLI, curl, SDK, or Open WebUI) sends a request to the Ollama server at localhost:11434.

  2. The server looks up the model on disk, pulls it if missing, and renders your messages into a prompt using the model's chat template.

  3. The server hands the prompt off to a llama.cpp runner process, spawning one if the model isn't already loaded.

Local AI Engineering with Ollama

Run, understand, customize, fine-tune, and build agentic apps on your own hardware

Enroll now to unlock all content and receive all future updates for free.