Silent Truncation: The Trap

When the rendered prompt exceeds num_ctx, Ollama does not error. It does not warn at the default log level. It quietly drops the oldest messages from the front of messages (preserving system messages where it can) until the prompt fits, then runs the model on whatever is left. The truncation log line is at debug level in server/prompt.go, so unless you started the server with debug logging, you will see nothing in stdout.

There is a GitHub issue tracking this exact UX problem and a proposed fix to upgrade the log to WARN. Until that lands, you have to look for it yourself.

To see truncation happen live, start a model like granite3.3:2b, paste a very long prompt, and watch the log:

journalctl -u ollama --no-pager  --pager-end | grep -E "truncating"

You will see lines like:

level=WARN source

Local AI Engineering with Ollama

Run, understand, customize, fine-tune, and build agentic apps on your own hardware

Enroll now to unlock all content and receive all future updates for free.

Unlock now $26.99 Learn More

Previous Next