Feedback

Chat Icon

Practical MCP with FastMCP & LangChain

Engineering the Agentic Experience

Deploying FastMCP in Production
98%

Running FastMCP in Production

If you want to run MCP servers over HTTP (Streamable HTTP transport) for remote, multi-client access, you typically expose an ASGI app (mcp.http_app()) and run it with Uvicorn so you can use multiple workers and integrate with existing infrastructure.

When scaling horizontally behind a load balancer, you must account for how FastMCP handles sessions.

You choose one of two operating models, each with distinct capabilities and trade-offs:

Stateful mode (default)

By default, Streamable HTTP maintains server-side sessions in memory. Sessions enable stateful MCP features including:

  • Session state: Persistent data across tool calls (ctx.get_state(), ctx.set_state())
  • Elicitation: Interactive user input during tool execution (ctx.elicit())
  • Sampling: LLM text generation requests from tools (ctx.sample())

In stateful mode, each client gets a unique session ID, and all requests from that client must route to the same server instance to access the session data.

For single-instance deployments, stateful mode works perfectly. All requests naturally hit the same server.

With multi-instance deployments, stateful mode requires session affinity (sticky sessions) so that every request from a given client reaches the server instance holding that session.

For the key-value data stored via ctx.get_state() and ctx.set_state(), FastMCP supports external storage backends via the session_state_store parameter. This distributes the state data across instances:

from fastmcp import FastMCP
from key_value.aio.stores.redis import RedisStore

mcp = FastMCP(
    "My Server",
    session_state_store=RedisStore(url="redis://localhost:6379")
)
app = mcp.http_app()

With a shared session store, state data persists across all server instances. However, session_state_store only externalizes the key-value state — the MCP transport session itself still lives in memory on whichever instance created it. This means you still need session affinity at the load-balancer level to route subsequent requests to the same instance.

It's worth noting that some MCP clients — including Cursor and Claude Code — use fetch() internally and do not properly forward Set-Cookie headers. Without cookies, load balancers cannot identify which instance should handle subsequent requests, causing session-dependent features to fail. Because of this limitation, stateless mode is recommended for horizontally scaled deployments.

For long-running tools that may exceed load balancer timeout limits, enable EventStore for resumable streaming:

from fastmcp.server.event_store import EventStore
from key_value.aio.stores.redis import

Practical MCP with FastMCP & LangChain

Engineering the Agentic Experience

Enroll now to unlock current content and receive all future updates for free. Your purchase supports the author and fuels the creation of more exciting content. Act fast, as the price will rise as the course nears completion!