Running Your Fine-Tuned Model in Ollama
70%
Step 2: Create the Model in Ollama
Ollama needs two things to run your GGUF: the file itself, and the Modelfile. The format has to match the one the model was trained with, or the model produces nonsense.
The Modelfile points at your GGUF and includes Granite's chat template, the same role tags from your training step. Create it next to the GGUF:
# granite_sql_gguf_gguf/Modelfile
FROM ./granite-4.0-micro.Q4_K_M.gguf
# Granite's chat format: each turn is wrapped in role tags and ends
# with <|end_of_text|>. This must match how the model was trained.
TEMPLATE """{{ if .System }}<|start_of_role|>system<|end_of_role|>{{ .System }}<|end_of_text|>
{{ end }}{{ if .Prompt }}<|start_of_role|>user<|end_of_role|>{{ .Prompt }}<|end_of_text|>
{{ end }}<|start_of_role|>assistant<|end_of_role|>"""
# Stop generating when the model emits the end tag, so it returns
# one answer instead of running on.
PARAMETER stop "<|end_of_text|>"
Now create the model from this Modelfile:
# Change into the model folderLocal AI Engineering with Ollama
Run, understand, customize, fine-tune, and build agentic apps on your own hardwareEnroll now to unlock all content and receive all future updates for free.
