How FastMCP Implements Sampling

In FastMCP, sampling is split cleanly between two sides: the server requests a completion, and the client fulfills it.

The server never touches an LLM directly. Instead, it calls ctx.sample() from inside a tool, hands off a prompt (and optionally a system prompt, model preferences, or conversation history), and awaits the result. The client receives that request, routes it to whatever LLM it has configured, and returns the generated text. From the server's perspective, it is just waiting for a string to come back.

The server side requires nothing beyond a Context parameter in your tool and an await:

from fastmcp import FastMCP, Context

mcp = FastMCP()

@mcp.tool
async def summarize(content: str, ctx: Context) -> str:
    """Generate a summary of the provided content."""
    result = await ctx.sample(f"Please summarize this:\n\n

Practical MCP with FastMCP & LangChain

Engineering the Agentic Experience

Enroll now to unlock current content and receive all future updates for free. Your purchase supports the author and fuels the creation of more exciting content. Act fast, as the price will rise as the course nears completion!

Unlock now $26.99 Learn More

Previous Next