Model Sampling
How FastMCP Implements Sampling
In FastMCP, sampling is split cleanly between two sides: the server requests a completion, and the client fulfills it.
The server never touches an LLM directly. Instead, it calls ctx.sample() from inside a tool, hands off a prompt (and optionally a system prompt, model preferences, or conversation history), and awaits the result. The client receives that request, routes it to whatever LLM it has configured, and returns the generated text. From the server's perspective, it is just waiting for a string to come back.
The server side requires nothing beyond a Context parameter in your tool and an await:
from fastmcp import FastMCP, Context
mcp = FastMCP()
@mcp.tool
async def summarize(content: str, ctx: Context) -> str:
"""Generate a summary of the provided content."""
result = await ctx.sample(f"Please summarize this:\n\nPractical MCP with FastMCP & LangChain
Engineering the Agentic ExperienceEnroll now to unlock current content and receive all future updates for free. Your purchase supports the author and fuels the creation of more exciting content. Act fast, as the price will rise as the course nears completion!
