Feedback

Chat Icon

Local AI Engineering with Ollama

Run, understand, customize, fine-tune, and build agentic apps on your own hardware

Core Concepts: From Tokens and Embeddings to Quantization and KV Cache
16%

Embeddings

Think of embeddings as "a giant lookup book".

The model has one row per token in its vocabulary. Each row holds a long list of numbers that describes the meaning of that token in a way the model can do math on.

When you type unbelievable things happen, 3 things happen:

Step 1: Chop the text into tokens:

A tokenizer splits your sentence into small pieces. Whole words when they're common, smaller chunks when they're not. Unbelievable is long enough that the tokenizer often breaks it up:

["un", "believ", "able", " things", " happen"]

This is why you'll sometimes see a 3-word sentence become 5 or 6 tokens. The model doesn't have every word in its vocabulary, so it builds rare words from familiar pieces.

Step 2: Look up each token's row number:

Every token has a fixed row number in "the model's book", called a token ID. The tokenizer doesn't think, it just looks up the number.

"un"      ->  515
"believ"  ->  67473
"able"    ->  481

Local AI Engineering with Ollama

Run, understand, customize, fine-tune, and build agentic apps on your own hardware

Enroll now to unlock all content and receive all future updates for free.