Share this article:
Understanding MCP Servers: How LLMs Use Tools in the Real World
Large Language Models (LLMs) are amazing at generating text, but they don’t know how to do things — unless you give them tools. That’s where MCP servers (Model Context Protocol) come in.
This post will explain:
- What MCP servers are in layperson's terms
- How the LLM knows which tool to use
- Where orchestration happens
- Visual diagrams of single and multi-server architectures
- How to build your own MCP setup using FastAPI
Imagine You’re Playing a Video Game
Let’s say you have a super-smart robot assistant. You say:
“Go find me the best weapon!”
To do this, the robot needs to:
Understand your request
Figure out what tools it has (map, catalog, etc.)
Choose the right tool
Use the tool, then return with the answer
This is exactly what happens when an LLM uses tools with the help of an MCP server.
What’s a Tool?
A tool is like a mini-app the LLM can use:
A calculator
A weather search
A database query
A document retriever
But the LLM doesn’t automatically know what tools exist. We need a way to tell it, and help it use those tools. That’s what an MCP server does.
What Is an MCP Server?
An MCP server is the conductor between the LLM and external tools. It:
- Tells the LLM what tools are available
- Watches for tool calls from the LLM
- Executes the tool
- Sends the result back to the LLM
- Lets the LLM finish its response using that tool output
It’s like Jarvis for your Iron Man suit — the LLM is smart, but the MCP handles logistics.
Architecture: Single MCP Flow
Here’s how it works step by step:
User → MCP Server → LLM → (decides to use tool)
↓ ← tool call
Run Tool → LLM → Final Response → User
But How Does the LLM Know the Tools?
It doesn’t — unless you inject that information into the prompt.
The MCP server builds a system prompt like this:
You can use these tools:
- getWeather(city: string)
- searchDatabase(query: string)
Respond with:
{ "tool": "getWeather", "args": { "city": "Miami" } }
Then the model knows what’s in its “tool belt” and can call a tool if needed.
Example MCP Server in FastAPI
from fastapi import FastAPI
from pydantic import BaseModel
app = FastAPI()
class ChatRequest(BaseModel):
user_input: str
def get_weather(city: str):
return f"The weather in {city} is 72°F and sunny."
@app.post("/chat")
def chat(req: ChatRequest):
if "weather" in req.user_input.lower():
return {"response": get_weather("Boston")}
return {"response": "Sorry, I can’t help with that."}
Note: This is a simplified example for illustration. To run this code, you'll need to add server startup code (
uvicorn.run()
) and install dependencies (pip install fastapi uvicorn
). This example shows HTTP routing concepts rather than full MCP protocol implementation.
🕸️ Multi-MCP Topology
Let’s say you have multiple teams or domains, each with their own tools. You can deploy multiple MCP servers and put a central orchestrator in front:
User
↓
MCP Orchestrator
↙ ↓ ↘
MCP A MCP B MCP C
↓ ↓ ↓
Tools Tools Tools
Minimal Multi-Server Setup
orchestrator.py
@app.post("/chat")
async def route_request(request: Request):
user_input = (await request.json())["user_input"]
if "weather" in user_input:
url = "http://localhost:8001/chat"
elif "calculate" in user_input:
url = "http://localhost:8002/chat"
else:
url = "http://localhost:8003/chat"
async with httpx.AsyncClient() as client:
return await client.post(url, json={"user_input": user_input})
Note: This code fragment requires additional imports (
from fastapi import FastAPI, Request
andimport httpx
), app initialization, error handling, and server startup code to function. Install dependencies withpip install fastapi uvicorn httpx
.
Each MCP server (A, B, C) would have its own tools and LLM access.
What Can You Build With This?
- A plug-and-play LLM that can do anything — securely and modularly
- Tenant-aware toolchains for AI apps
- On-prem or multi-region AI tools with intelligent routing
- Enterprise-grade automation bots with composable capabilities
Final Thoughts
The Model Context Protocol (MCP) is a powerful and flexible way to extend LLMs with real-world abilities, while keeping everything structured and composable.
You now have:
- A mental model
- Diagrams
- A FastAPI prototype
- A multi-server architecture
Have questions or want to go deeper? Reach out, or copy the code snippets and start building your own MCP server.