Multiple Models in Memory at Once

Ollama can hold more than one model loaded simultaneously, controlled by OLLAMA_MAX_LOADED_MODELS. The default is 3 times the number of GPUs (3 * Number of GPUs) or 3 for CPU-only systems, but the practical limit is whether they all fit in memory.

# Allow up to 5 models loaded at once
# edit the unit file with `systemctl edit ollama`

[Service]
Environment

Local AI Engineering with Ollama

Run, understand, customize, fine-tune, and build agentic apps on your own hardware

Enroll now to unlock all content and receive all future updates for free.

Unlock now $26.99 Learn More

Previous Next