Inspecting Models

Using the CLI

ollama show prints the model's metadata: architecture, parameter count, context length, quantization, license, and the system prompt baked in (if any).

# this is the model we used
export MODEL="granite3.3:2b"

# print the model's metadata
ollama show $MODEL

Example output:

  Model
    architecture        granite    
    parameters          2.5B       
    context length      131072     
    embedding length    2048       
    quantization        Q4_K_M     

  Capabilities
    completion    
    tools         

  License
    Apache License               
    Version 2.0, January 2004

Read this before you commit to a model. Each field tells you something that affects how the model behaves, what it can do, and whether it fits on your hardware.

architecture is the model family the weights belong to: granite, llama, qwen2, mistral, gemma, and so on. This matters because different architectures have different prompt templates, tokenizers, and quirks. Ollama handles this for you when you use ollama run, but it's relevant if you're calling the raw API or importing a GGUF file by hand.
parameters is the model's size in number of weights (2.5B means 2.5 billion parameters). This is the headline number for capability and resource cost: more parameters generally means smarter responses, but also more disk space, more memory, and slower inference.
context length is the maximum amount of text the model can process in one go, measured in tokens. 131072 means 131,072 tokens, or roughly 128K, which works out to around 98,000 words of English text (if we consider that a token averages about 0.75 of a word).
embedding length is the size of the internal vector the model uses to represent each token, sometimes called the hidden dimension. 2048 here means every token gets turned into a 2048-number vector as it flows through the model. You don't set this and you don't tune it: it's fixed by the architecture. It's mostly useful when you're building advanced tools like semantic search or RAG using this model.
quantization is how aggressively the weights were compressed from their original precision.
Capabilities lists what the model can do beyond plain text generation. You'll find completion, tools, vision, embedding, and thinking at this point. If a capability isn't listed, the model wasn't trained for it; bolting it on at runtime usually doesn't work.
License is the legal terms attached to the model. This matters more than people think. Apache 2.0 and MIT let you use the model commercially with minimal restrictions. Llama's license has acceptable-use clauses and a 700M monthly active user threshold. Some "open" models (Gemma, certain Mistral variants) have custom licenses with their own conditions. If you're shipping a product, read this before you ship.

Some other useful flags we can use with ollama show to get more details about the model include:

# the Modelfile used to build this model
ollama show $MODEL --modelfile    

# default sampling parameters
ollama show $MODEL --parameters   

# the prompt template
ollama show $MODEL --template     

# the system prompt, if set
ollama show $MODEL --system

(i) Reminder: A Modelfile is Ollama's recipe for building a model. It's a plain text file, similar in spirit to a Dockerfile. We'll cover this.

Using the API

The API equivalent is /api/show:

# Choose a model
export MODEL=llama3.2:3b

# Pull the model if you haven't already
ollama pull $MODEL

# Show details
curl -s http://localhost:11434/api/show \
  -d "{\"model\": \"$MODEL\"}" | jq .

The response is large, but the top-level keys you'll care about are:

{
  "license": "Apache License...",
  "modelfile": "# Modelfile generated by ...",
  "parameters": "stop \"<|end_of_text|>\"\nstop \"<|eom_id|>\"",
  "template": "{{ .System }}{{ .Prompt }}",
  "system": "",
  "details": {
    "parent_model": "",
    "format": "gguf",
    "family": "granite",
    "families": ["granite"],
    "parameter_size": "2.5B",
    "quantization_level": "Q4_K_M"
  },
  "model_info": {
    "general.architecture": "granite",
    "general.parameter_count": 2533539840,
    "granite.context_length": 131072,
    "granite.embedding_length": 2048,
    "granite.attention.head_count": 32,
    "granite.attention.head_count_kv": 8,
    "tokenizer.ggml.model": "gpt2"
  },
  "capabilities": ["completion", "tools"]
}

Where each ollama show field lives in the JSON:

CLI output	API path

Local AI Engineering with Ollama

Run, understand, customize, fine-tune, and build agentic apps on your own hardware

Enroll now to unlock all content and receive all future updates for free.

Unlock now $26.99 Learn More

Previous Next