Feedback

Chat Icon

Practical MCP with FastMCP & LangChain

Engineering the Agentic Experience

Augmenting RAG Agents with MCP Servers
96%

Building the PDF Q&A Agent

Let's put the theory into practice. We will build a small agent that reads a PDF using an MCP tool, chunks and embeds the text into a FAISS index, and then answers questions using only the relevant parts of the document.

Follow along with the steps below to create the agent. The full code is also available in the companion kit.

Step 1: Create the Project

mkdir -p $HOME/workspace/langchain/langchain_rag_agent
cd $HOME/workspace/langchain/langchain_rag_agent

uv init --bare --python 3.12

Step 2: Install Dependencies

Here is what's needed in addition to the libraries we've already used in previous chapters:

  • langchain-community includes the FAISS vector store wrapper.

  • langchain-text-splitters provides the chunking logic.

  • faiss-cpu is the FAISS library itself.

Run the following command to add them:

uv add \
    "langchain-community" \
    "langchain-text-splitters" \
    "langchain-openai" \
    "langchain-mcp-adapters" \
    "faiss-cpu" \
    "python-dotenv"

There are many alternative libraries for chunking, embedding, and vector storage. The ones we chose are just examples among many options.

Step 3: Add Your API Key

cat > .env <
OPENAI_API_KEY=your_openai_api_key_here
EOF

Step 4: Write the Agent

Now, we're going to create agent.py. Let's walk through it block by block.

Imports

import asyncio

from dotenv import load_dotenv
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_mcp_adapters.client import MultiServerMCPClient
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings

load_dotenv()
  • FAISS is LangChain's wrapper around the Facebook AI Similarity Search library — our vector store. RecursiveCharacterTextSplitter does the chunking.

  • OpenAIEmbeddings converts text into vectors, and ChatOpenAI is the LLM that will answer questions.

  • MultiServerMCPClient connects to the MCP server that reads PDFs, which we first introduced in the previous chapter.

Connect to the PDF MCP Server

client = MultiServerMCPClient(
    {
        "pdf-reader": {
            "transport": "stdio",
            "command": "npx",
            "args": ["@sylphx/pdf-reader-mcp"],
        }
    }
)
tools = await client.get_tools()
read_pdf = next(t for t in tools if t.name == "read_pdf")

We are going to use an external MCP server maintained by SylphxAI, an AI platform as a service provider.

@sylphx/pdf-reader-mcp is an npm package that exposes a MCP tool called read_pdf. It accepts a file path or URL and returns the extracted text. The npx command downloads and runs the package on the fly — no global install needed. For external MCP servers like this one, you need to first read the documentation to understand how to call the tool and what kind of input it expects. In this case, it wants a list of sources, each with a path property, and an optional include_full_text flag.

The server runs as a subprocess communicating over stdio, just like the FastMCP server we wrote in the previous chapter. It acts purely as a PDF extraction layer: it reads and parses the PDF so we don't have to deal with PDF parsing libraries in Python. The text it returns is then chunked and embedded locally — the MCP server never sees our questions or the LLM's responses.

We grab the read_pdf tool by name from the list because the server might expose other tools in the future, and we only want this specific one.

Extract PDF Text

pdf_path = input("PDF path: ").strip()

print("Reading PDF...")
raw = await read_pdf.ainvoke(
    {"sources": [{"path": pdf_path}], "include_full_text": True}
)
full_text = raw if isinstance(raw, str) else str(raw)

Here we call the MCP tool directly with read_pdf.ainvoke(...) instead of letting an agent decide when to call it. This is a different pattern from the previous chapters — we are using an MCP tool as a plain async function, not wiring it into a ReAct loop. This makes sense here because there's no decision to make: we always want to read the PDF first before doing anything else.

The isinstance check is a safety guard — the tool should return a string, but if it returns a structured object we convert it to text so the rest of the pipeline doesn't break.

Chunk and Index

splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_text(full_text)
index = FAISS.from_texts(chunks, OpenAIEmbeddings())
print(f"Ready — {len(chunks)} chunks indexed.\n")
  • RecursiveCharacterTextSplitter splits the text into chunks of roughly 500 characters with a 50-character overlap between consecutive chunks. "Recursive" means it tries to split on paragraph boundaries first, then sentences, then words — it keeps chunks as semantically coherent as possible rather than chopping at arbitrary character positions.

  • FAISS.from_texts() does two things in one call:

  • It sends each chunk to the OpenAI embedding API to get a vector,

  • Then stores all the vectors in an in-memory FAISS index.

After this line, the index is ready to semantically search for relevant chunks based on user questions.

Q&A Loop

This is RAG in action:

llm = ChatOpenAI(model="gpt-5-mini")
while True:
    question = input("You: ").strip()
    if not question or question.lower() in {"exit", "quit"}:
        break

    docs = index.similarity_search(question, k=5)
    context = "\n\n---\n\n".join(d.page_content for d in docs)

    answer = llm.invoke(
        f"Answer based only on the context below.\n\nContext:\n{context}\n\nQuestion: {question}"
    )
    print(f"Agent: {answer.content}\n")

For each question the user types, similarity_search(question, k=5) embeds the question and returns the 5 chunks whose vectors are closest to it. Those chunks are joined into a single context string with --- separators so the model can see the boundaries.

The prompt tells the model to Answer based only on the context below — this is the grounding instruction that stops the model from making up information. The model reads the retrieved chunks, formulates an answer, and we print it.

Complete Agent Code

Here is how the final code looks in its entirety. Use the following command to create agent.py:

cat > agent.py <<EOF
# agent.py
import asyncio

from dotenv import load_dotenv
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_mcp_adapters.client import MultiServerMCPClient
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings

load_dotenv

Practical MCP with FastMCP & LangChain

Engineering the Agentic Experience

Enroll now to unlock current content and receive all future updates for free. Your purchase supports the author and fuels the creation of more exciting content. Act fast, as the price will rise as the course nears completion!