Understanding MCP Servers: How LLMs Use Tools in the Real World

Model Context Protocol (MCP) servers are the key to making LLMs actually useful beyond chat. In this post, we explore how they work, how they decide which tools to use, and how to build your own orchestrator and server stack using FastAPI.

Share this article:

Understanding MCP Servers: How LLMs Use Tools in the Real World

Large Language Models (LLMs) are amazing at generating text, but they don’t know how to do things — unless you give them tools. That’s where MCP servers (Model Context Protocol) come in.

This post will explain:

  • What MCP servers are in layperson's terms
  • How the LLM knows which tool to use
  • Where orchestration happens
  • Visual diagrams of single and multi-server architectures
  • How to build your own MCP setup using FastAPI

Imagine You’re Playing a Video Game

Let’s say you have a super-smart robot assistant. You say:

“Go find me the best weapon!”

To do this, the robot needs to:

  1. Understand your request

  2. Figure out what tools it has (map, catalog, etc.)

  3. Choose the right tool

  4. Use the tool, then return with the answer

This is exactly what happens when an LLM uses tools with the help of an MCP server.


What’s a Tool?

A tool is like a mini-app the LLM can use:

  • A calculator

  • A weather search

  • A database query

  • A document retriever

But the LLM doesn’t automatically know what tools exist. We need a way to tell it, and help it use those tools. That’s what an MCP server does.


What Is an MCP Server?

An MCP server is the conductor between the LLM and external tools. It:

  • Tells the LLM what tools are available
  • Watches for tool calls from the LLM
  • Executes the tool
  • Sends the result back to the LLM
  • Lets the LLM finish its response using that tool output

It’s like Jarvis for your Iron Man suit — the LLM is smart, but the MCP handles logistics.


Architecture: Single MCP Flow

Here’s how it works step by step:

User → MCP Server → LLM → (decides to use tool)
        ↓          ← tool call
     Run Tool  →  LLM → Final Response → User

But How Does the LLM Know the Tools?

It doesn’t — unless you inject that information into the prompt.

The MCP server builds a system prompt like this:

You can use these tools:
- getWeather(city: string)
- searchDatabase(query: string)

Respond with:
{ "tool": "getWeather", "args": { "city": "Miami" } }

Then the model knows what’s in its “tool belt” and can call a tool if needed.


Example MCP Server in FastAPI

from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()

class ChatRequest(BaseModel):
    user_input: str

def get_weather(city: str):
    return f"The weather in {city} is 72°F and sunny."

@app.post("/chat")
def chat(req: ChatRequest):
    if "weather" in req.user_input.lower():
        return {"response": get_weather("Boston")}
    return {"response": "Sorry, I can’t help with that."}

Note: This is a simplified example for illustration. To run this code, you'll need to add server startup code (uvicorn.run()) and install dependencies (pip install fastapi uvicorn). This example shows HTTP routing concepts rather than full MCP protocol implementation.


🕸️ Multi-MCP Topology

Let’s say you have multiple teams or domains, each with their own tools. You can deploy multiple MCP servers and put a central orchestrator in front:

                 User
                  ↓
          MCP Orchestrator
           ↙     ↓     ↘
       MCP A   MCP B   MCP C
        ↓        ↓        ↓
      Tools    Tools    Tools

Minimal Multi-Server Setup

orchestrator.py

@app.post("/chat")
async def route_request(request: Request):
    user_input = (await request.json())["user_input"]
    if "weather" in user_input:
        url = "http://localhost:8001/chat"
    elif "calculate" in user_input:
        url = "http://localhost:8002/chat"
    else:
        url = "http://localhost:8003/chat"

    async with httpx.AsyncClient() as client:
        return await client.post(url, json={"user_input": user_input})

Note: This code fragment requires additional imports (from fastapi import FastAPI, Request and import httpx), app initialization, error handling, and server startup code to function. Install dependencies with pip install fastapi uvicorn httpx.

Each MCP server (A, B, C) would have its own tools and LLM access.


What Can You Build With This?

  • A plug-and-play LLM that can do anything — securely and modularly
  • Tenant-aware toolchains for AI apps
  • On-prem or multi-region AI tools with intelligent routing
  • Enterprise-grade automation bots with composable capabilities

Final Thoughts

The Model Context Protocol (MCP) is a powerful and flexible way to extend LLMs with real-world abilities, while keeping everything structured and composable.

You now have:

  • A mental model
  • Diagrams
  • A FastAPI prototype
  • A multi-server architecture

Have questions or want to go deeper? Reach out, or copy the code snippets and start building your own MCP server.