Augmenting RAG Agents with MCP Servers
Building the PDF Q&A Agent
Let's put the theory into practice. We will build a small agent that reads a PDF using an MCP tool, chunks and embeds the text into a FAISS index, and then answers questions using only the relevant parts of the document.
Follow along with the steps below to create the agent. The full code is also available in the companion kit.
Step 1: Create the Project
mkdir -p $HOME/workspace/langchain/langchain_rag_agent
cd $HOME/workspace/langchain/langchain_rag_agent
uv init --bare --python 3.12
Step 2: Install Dependencies
Here is what's needed in addition to the libraries we've already used in previous chapters:
langchain-communityincludes the FAISS vector store wrapper.langchain-text-splittersprovides the chunking logic.faiss-cpuis the FAISS library itself.
Run the following command to add them:
uv add \
"langchain-community" \
"langchain-text-splitters" \
"langchain-openai" \
"langchain-mcp-adapters" \
"faiss-cpu" \
"python-dotenv"
There are many alternative libraries for chunking, embedding, and vector storage. The ones we chose are just examples among many options.
Step 3: Add Your API Key
cat > .env <
OPENAI_API_KEY=your_openai_api_key_here
EOF
Step 4: Write the Agent
Now, we're going to create agent.py. Let's walk through it block by block.
Imports
import asyncio
from dotenv import load_dotenv
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_mcp_adapters.client import MultiServerMCPClient
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
load_dotenv()
FAISSis LangChain's wrapper around the Facebook AI Similarity Search library — our vector store.RecursiveCharacterTextSplitterdoes the chunking.OpenAIEmbeddingsconverts text into vectors, andChatOpenAIis the LLM that will answer questions.MultiServerMCPClientconnects to the MCP server that reads PDFs, which we first introduced in the previous chapter.
Connect to the PDF MCP Server
client = MultiServerMCPClient(
{
"pdf-reader": {
"transport": "stdio",
"command": "npx",
"args": ["@sylphx/pdf-reader-mcp"],
}
}
)
tools = await client.get_tools()
read_pdf = next(t for t in tools if t.name == "read_pdf")
We are going to use an external MCP server maintained by SylphxAI, an AI platform as a service provider.
@sylphx/pdf-reader-mcp is an npm package that exposes a MCP tool called read_pdf. It accepts a file path or URL and returns the extracted text. The npx command downloads and runs the package on the fly — no global install needed. For external MCP servers like this one, you need to first read the documentation to understand how to call the tool and what kind of input it expects. In this case, it wants a list of sources, each with a path property, and an optional include_full_text flag.
The server runs as a subprocess communicating over stdio, just like the FastMCP server we wrote in the previous chapter. It acts purely as a PDF extraction layer: it reads and parses the PDF so we don't have to deal with PDF parsing libraries in Python. The text it returns is then chunked and embedded locally — the MCP server never sees our questions or the LLM's responses.
We grab the read_pdf tool by name from the list because the server might expose other tools in the future, and we only want this specific one.
Extract PDF Text
pdf_path = input("PDF path: ").strip()
print("Reading PDF...")
raw = await read_pdf.ainvoke(
{"sources": [{"path": pdf_path}], "include_full_text": True}
)
full_text = raw if isinstance(raw, str) else str(raw)
Here we call the MCP tool directly with read_pdf.ainvoke(...) instead of letting an agent decide when to call it. This is a different pattern from the previous chapters — we are using an MCP tool as a plain async function, not wiring it into a ReAct loop. This makes sense here because there's no decision to make: we always want to read the PDF first before doing anything else.
The isinstance check is a safety guard — the tool should return a string, but if it returns a structured object we convert it to text so the rest of the pipeline doesn't break.
Chunk and Index
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_text(full_text)
index = FAISS.from_texts(chunks, OpenAIEmbeddings())
print(f"Ready — {len(chunks)} chunks indexed.\n")
RecursiveCharacterTextSplittersplits the text into chunks of roughly 500 characters with a 50-character overlap between consecutive chunks. "Recursive" means it tries to split on paragraph boundaries first, then sentences, then words — it keeps chunks as semantically coherent as possible rather than chopping at arbitrary character positions.FAISS.from_texts()does two things in one call:It sends each chunk to the OpenAI embedding API to get a vector,
- Then stores all the vectors in an in-memory FAISS index.
After this line, the index is ready to semantically search for relevant chunks based on user questions.
Q&A Loop
This is RAG in action:
llm = ChatOpenAI(model="gpt-5-mini")
while True:
question = input("You: ").strip()
if not question or question.lower() in {"exit", "quit"}:
break
docs = index.similarity_search(question, k=5)
context = "\n\n---\n\n".join(d.page_content for d in docs)
answer = llm.invoke(
f"Answer based only on the context below.\n\nContext:\n{context}\n\nQuestion: {question}"
)
print(f"Agent: {answer.content}\n")
For each question the user types, similarity_search(question, k=5) embeds the question and returns the 5 chunks whose vectors are closest to it. Those chunks are joined into a single context string with --- separators so the model can see the boundaries.
The prompt tells the model to Answer based only on the context below — this is the grounding instruction that stops the model from making up information. The model reads the retrieved chunks, formulates an answer, and we print it.
Complete Agent Code
Here is how the final code looks in its entirety. Use the following command to create agent.py:
cat > agent.py <<EOF
# agent.py
import asyncio
from dotenv import load_dotenv
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_mcp_adapters.client import MultiServerMCPClient
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
load_dotenvPractical MCP with FastMCP & LangChain
Engineering the Agentic ExperienceEnroll now to unlock current content and receive all future updates for free. Your purchase supports the author and fuels the creation of more exciting content. Act fast, as the price will rise as the course nears completion!
