Step 2: Create the Model in Ollama

Ollama needs two things to run your GGUF: the file itself, and the Modelfile. The format has to match the one the model was trained with, or the model produces nonsense.

The Modelfile points at your GGUF and includes Granite's chat template, the same role tags from your training step. Create it next to the GGUF:

# granite_sql_gguf_gguf/Modelfile

FROM ./granite-4.0-micro.Q4_K_M.gguf

# Granite's chat format: each turn is wrapped in role tags and ends
# with <|end_of_text|>. This must match how the model was trained.
TEMPLATE """{{ if .System }}<|start_of_role|>system<|end_of_role|>{{ .System }}<|end_of_text|>
{{ end }}{{ if .Prompt }}<|start_of_role|>user<|end_of_role|>{{ .Prompt }}<|end_of_text|>
{{ end }}<|start_of_role|>assistant<|end_of_role|>"""

# Stop generating when the model emits the end tag, so it returns
# one answer instead of running on.
PARAMETER stop "<|end_of_text|>"

Now create the model from this Modelfile:

# Change into the model folder

Local AI Engineering with Ollama

Run, understand, customize, fine-tune, and build agentic apps on your own hardware

Enroll now to unlock all content and receive all future updates for free.

Unlock now $26.99 Learn More

Previous Next