Building Advanced Agents: Caching
Pass 6: Cache Model Replies in Redis so Repeated Questions Come Back Instantly
Same agent as pass 5, but now we save every model reply in Redis. If the user asks the exact same question again (with the same model and settings), LangChain returns the saved answer immediately without calling the model. This is a good idea for production, but also for development, demos, and any prompt you find yourself running over and over.
Step 1: Start a Redis Server
The easiest path to get Redis running is to use Docker. Run:
# Install docker
curl -fsSL https://get.docker.com | sh
# Run a Redis server
docker run -d \
-p 6379:6379 \
--name redis \
redis/redis-stack-server:7.4.0-v8
This exposes Redis on localhost:6379, which matches the default we'll use from config (config.py).
Step 2: Add the Redis Settings to config.py
Two new constants control where the cache lives and how long entries stick around:
# config.py
REDIS_URL: str = os.environ.get(
"REDIS_URL", "redis://localhost:6379"
)
REDIS_CACHE_TTL: int = int(
os.environ.get("REDIS_CACHE_TTL", "3600")
)
REDIS_URL points at the server we just started. REDIS_CACHE_TTL is the time-to-live in seconds (default 1 hour); after this, an entry expires and the next identical question will hit the model again.
Step 3: Turn On the Global LLM Cache
LangChain has a single global setting that enables caching for every model call in the program:
Local AI Engineering with Ollama
Run, understand, customize, fine-tune, and build agentic apps on your own hardwareEnroll now to unlock all content and receive all future updates for free.
