What Is a Token?

A token is a chunk of text. It's the unit a language model actually reads and writes. Models don't see characters or words directly, they see tokens.

A token can be a whole word, part of a word, a single character, or even a space. The split depends on the model's tokenizer (the tool that does the chopping).

A common rule of thumb for English is: 1 token is roughly 4 characters or 0.75 words. So 1,000 tokens is about 750 words.

Examples from a typical tokenizer:

Text	Tokens
`hello`	`["hello"]` (1 token)
`unbelievable`	`["un", "believ", "able"]` (3 tokens)
`Paris`	`["Paris"]`

Local AI Engineering with Ollama

Run, understand, customize, fine-tune, and build agentic apps on your own hardware

Enroll now to unlock all content and receive all future updates for free.

Unlock now $26.99 Learn More

Previous Next