Deploying Remote MCP Servers with FastAPI Explained in Under 10 Minutes
Welcome back! In today's article, we're going to explore how to create and deploy remote MCP servers. We'll build an MCP server locally and then make it accessible over streamable HTTP by hosting it on a remote server. This allows us to provide multiple, scalable MCP servers for our clients or AI agents.
To achieve this, we will cover several key topics:
1. What is an MCP Server? A quick refresher on MCP and how it works.
2. Example MCP Server: We'll use Tavily to build a server that can search the internet.
3. Debugging with Inspector: Learn how to debug a remote MCP server, which differs slightly from debugging a local stdio
server.
4. Testing in VS Code: We'll connect our server to GitHub Copilot to test its functionality.
5. Hosting Multiple Servers: We'll see how to mount numerous MCP servers within a single FastAPI application to optimize resource usage.
6. Deployment: Finally, we'll deploy our application and connect to it over HTTP.
Let's get started.
Understanding the MCP Protocol
For those unfamiliar with MCP, here’s a brief overview. MCP (Meta-agent Communication Protocol) is a standardized protocol that connects an AI application (like Cursor, Copilot, or Cloud Code) to a set of external tools. An agent often needs access to the internet, local files, or internal documentation to perform its tasks. MCP provides a common language for tool developers and agent developers to communicate, allowing any MCP-compatible agent to use any MCP-compatible tool server.
For instance, the Cursor directory features many MCP servers, such as Firecrawl, which turns entire websites into LLM-ready data.
The concept of agents using tools isn't new. Traditionally, tools were coded directly within the agent itself. A common pattern is the ReAct (Reasoning and Acting) prompting model. Here’s how it typically works:
- User Query: A user sends a request, like "Create an integration with the Supabase SDK."
- Agent Thinks: The agent analyzes the request and decides on a plan. It might think, "I need to find the latest Supabase SDK documentation."
- Agent Acts: The agent calls a tool, such as a
search_web
tool, to get the information. - Observation: The tool returns the search results.
- Agent Thinks Again: Based on the new information, the agent might decide to use another tool, like
read_file
, to examine the existing codebase. This think-act-observe loop continues until the task is complete.
The innovation of MCP is that the tools are no longer hard-coded into the agent. Instead, you can have multiple, independent toolkits. One toolkit might provide access to your Confluence data, while another offers web-scraping capabilities. As long as your agent is MCP-compatible, it can connect to and use any of these toolkits.
In this article, we will build an MCP server and make it available over HTTP. Furthermore, we'll mount multiple MCP servers into a single FastAPI application, so you don't need to deploy dozens of instances to host dozens of different MCP servers.
Building a Web-Searching MCP Server
While this article won't cover the entire process of creating an MCP server from scratch, we'll review a complete example that you can adapt. We'll focus on making an existing server HTTP-compatible.
Here is a server that exposes a web search tool using the Tavily API for programmatic internet searches.
# server.py
import os
from tavily import TavilyClient
from fast_mcp import McpServer, McpTool
# Initialize the Tavily client with your API key
tavily_client = TavilyClient(api_key=os.environ["TAVILY_API_KEY"])
# Initialize the MCP Server
# CRITICAL: Set transport to 'streamable-http' for remote access
mcp = McpServer(
"web-search",
transport="streamable-http",
)
@mcp.tool()
def web_search(query: str) -> list[dict]:
"""
Use this tool to search the web for information using the Tavily API.
:param query: The search query.
:return: The search results.
"""
return tavily_client.search(query)
# Host the server on localhost for local testing
if __name__ == "__main__":
mcp.run(host="0.0.0.0", port=8000)
A few key points about this code:
- Tool Definition: The web_search
function is decorated with @mcp.tool()
, turning it into a tool the agent can execute.
- Function Naming: The function name (web_search
) must be descriptive, as the agent uses it to decide whether to call the tool.
- Docstring as Prompt: The docstring is not just for documentation; it acts as a prompt for the agent, explaining when and how to use the tool. Be explicit and clear.
- Transport Layer: The most important change for remote deployment is setting transport="streamable-http"
. The default, stdio
, is for local execution only. HTTP allows the server to be accessed from anywhere via a URL.
To run this server locally, first activate your virtual environment and then execute the script:
bash
uv run server.py
Your server should now be running on http://0.0.0.0:8000
.
Debugging Your HTTP MCP Server
Debugging an HTTP-based MCP server requires a specific approach. You can't simply send requests from a standard HTTP client like Postman because the server expects JSON-RPC messages.
The Anthropic team created the Inspector application to simplify this process.
Note: When testing, there are a few tricky details to remember.
- Open the Inspector.
- Change the Transport Type from the default
stdio
to streamable-http. - Enter the URL where your server is running. Crucially, you must append
/mcp
to the path. For our local server, the URL would behttp://localhost:8000/mcp
.
If you connect without /mcp
, you will get a 404 Not Found
error.
Once connected, you can test your tools. For example, running the web_search
tool with the query "Who is Sam Altman?" will call the Tavily API and return the latest search results.
Integrating with a Client: VS Code Copilot
With the server running and verified, let's connect it to an agent. We'll use GitHub Copilot in VS Code.
- In Copilot, select Add MCP Server.
- Choose the HTTP option.
- Enter the URL where your server is hosted.
Critical Tip: For some reason, the URL must end with a trailing slash (/
). If you omit it, the connection will likely fail. The correct URL is http://localhost:8000/mcp/
.
After adding the server (e.g., naming it tavily-search
), you can test it in the chat. Ask a question that requires a web search, like "Search on the internet who is Sam Altman." Copilot should recognize the need to use your tool, and you'll see a prompt to allow the action. Once you approve, it will return the results from your server.
Hosting Multiple MCP Servers in One FastAPI App
If you have over 5+ MCP servers, deploying them in separate instances is inefficient. A better approach is to mount all of them into a single FastAPI application, with each server exposed on a different path.
Here’s how you can do it. Imagine you have two simple servers, eco_server
and math_server
.
# main_server.py
from contextlib import asynccontextmanager
from fastapi import FastAPI
from fast_mcp.server.lifespan import session_manager
# Import your MCP server instances
from .eco_server import mcp as eco_mcp
from .math_server import mcp as math_mcp
@asynccontextmanager
async def lifespan(app: FastAPI):
# Run all your MCP servers concurrently
async with session_manager.run(eco_mcp, math_mcp):
yield
app = FastAPI(lifespan=lifespan)
# Mount each MCP server on a specific path
app.mount("/eco", eco_mcp.app)
app.mount("/math", math_mcp.app)
To run this application, you would use a command like:
bash
uv run fast_api_example:server
Testing Mounted Servers
Testing mounted servers in the Inspector is also a bit tricky. The URL must follow the pattern: BASE_URL/MOUNT_PATH/mcp
.
- To test the eco server, use:
http://localhost:8000/eco/mcp
- To test the math server, use:
http://localhost:8000/math/mcp
Connecting to these endpoints will allow you to list and run the tools for each respective server.
Deploying to the Cloud
Now, let's deploy our multi-server application. We'll use Render, which offers a free tier suitable for learning and testing.
Prepare for Deployment: Modify your server to use the port specified by the environment variable, which is standard for most hosting providers.
# In your mcp.run() or uvicorn.run() call port=int(os.environ.get("PORT", 8000))
Push to GitHub: Commit your code and push it to a GitHub repository.
Deploy on Render:
- Create a new Web Service on Render and connect your GitHub account.
- Select your repository.
- Set the Build Command (e.g.,
uv sync
). - Set the Start Command to run your FastAPI application (e.g.,
uv run fast_api_example:server --host 0.0.0.0
). - Add any necessary environment variables (like your
TAVILY_API_KEY
). - Deploy the service.
Final Verification
Once deployed, Render will provide a public URL for your application (e.g., https://my-mcp-app.onrender.com
).
You can test this remote server just like you did locally:
- Inspector: Connect to https://my-mcp-app.onrender.com/eco/mcp
to test the eco server.
- VS Code: Add a new MCP server using the remote URL (https://my-mcp-app.onrender.com/eco/mcp/
) to connect Copilot to your deployed eco tool.
After starting a new chat and testing the tool, you should see it working perfectly, running entirely from your remote server.
This covers the end-to-end process of creating, combining, and deploying remote MCP servers. As a next step, you could explore adding an authentication layer to secure your servers so that only authorized clients can connect.
Join the 10xdev Community
Subscribe and get 8+ free PDFs that contain detailed roadmaps with recommended learning periods for each programming language or field, along with links to free resources such as books, YouTube tutorials, and courses with certificates.