Feedback

Chat Icon

Local AI Engineering with Ollama

Run, understand, customize, fine-tune, and build agentic apps on your own hardware

Building Advanced Agents: Caching
86%

Pass 6: Cache Model Replies in Redis so Repeated Questions Come Back Instantly

Same agent as pass 5, but now we save every model reply in Redis. If the user asks the exact same question again (with the same model and settings), LangChain returns the saved answer immediately without calling the model. This is a good idea for production, but also for development, demos, and any prompt you find yourself running over and over.

Step 1: Start a Redis Server

The easiest path to get Redis running is to use Docker. Run:

# Install docker
curl -fsSL https://get.docker.com | sh

# Run a Redis server
docker run -d \
  -p 6379:6379 \
  --name redis \
  redis/redis-stack-server:7.4.0-v8

This exposes Redis on localhost:6379, which matches the default we'll use from config (config.py).

Step 2: Add the Redis Settings to config.py

Two new constants control where the cache lives and how long entries stick around:

# config.py
REDIS_URL: str = os.environ.get(
    "REDIS_URL", "redis://localhost:6379"
)

REDIS_CACHE_TTL: int = int(
    os.environ.get("REDIS_CACHE_TTL", "3600")
)

REDIS_URL points at the server we just started. REDIS_CACHE_TTL is the time-to-live in seconds (default 1 hour); after this, an entry expires and the next identical question will hit the model again.

Step 3: Turn On the Global LLM Cache

LangChain has a single global setting that enables caching for every model call in the program:

Local AI Engineering with Ollama

Run, understand, customize, fine-tune, and build agentic apps on your own hardware

Enroll now to unlock all content and receive all future updates for free.