Pass 6: Cache Model Replies in Redis so Repeated Questions Come Back Instantly

Same agent as pass 5, but now we save every model reply in Redis. If the user asks the exact same question again (with the same model and settings), LangChain returns the saved answer immediately without calling the model. This is a good idea for production, but also for development, demos, and any prompt you find yourself running over and over.

Step 1: Start a Redis Server

The easiest path to get Redis running is to use Docker. Run:

# Install docker
curl -fsSL https://get.docker.com | sh

# Run a Redis server
docker run -d \
  -p 6379:6379 \
  --name redis \
  redis/redis-stack-server:7.4.0-v8

This exposes Redis on localhost:6379, which matches the default we'll use from config (config.py).

Step 2: Add the Redis Settings to `config.py`

Two new constants control where the cache lives and how long entries stick around:

# config.py
REDIS_URL: str = os.environ.get(
    "REDIS_URL", "redis://localhost:6379"
)

REDIS_CACHE_TTL: int = int(
    os.environ.get("REDIS_CACHE_TTL", "3600")
)

REDIS_URL points at the server we just started. REDIS_CACHE_TTL is the time-to-live in seconds (default 1 hour); after this, an entry expires and the next identical question will hit the model again.

Step 3: Turn On the Global LLM Cache

LangChain has a single global setting that enables caching for every model call in the program: