Deploying Remote MCP Servers with FastAPI Explained in Under 10 Minutes |

Welcome back! In today's article, we're going to explore how to create and deploy remote MCP servers. We'll build an MCP server locally and then make it accessible over streamable HTTP by hosting it on a remote server. This allows us to provide multiple, scalable MCP servers for our clients or AI agents.

To achieve this, we will cover several key topics: 1. What is an MCP Server? A quick refresher on MCP and how it works. 2. Example MCP Server: We'll use Tavily to build a server that can search the internet. 3. Debugging with Inspector: Learn how to debug a remote MCP server, which differs slightly from debugging a local stdio server. 4. Testing in VS Code: We'll connect our server to GitHub Copilot to test its functionality. 5. Hosting Multiple Servers: We'll see how to mount numerous MCP servers within a single FastAPI application to optimize resource usage. 6. Deployment: Finally, we'll deploy our application and connect to it over HTTP.

Let's get started.

Understanding the MCP Protocol

For those unfamiliar with MCP, here’s a brief overview. MCP (Meta-agent Communication Protocol) is a standardized protocol that connects an AI application (like Cursor, Copilot, or Cloud Code) to a set of external tools. An agent often needs access to the internet, local files, or internal documentation to perform its tasks. MCP provides a common language for tool developers and agent developers to communicate, allowing any MCP-compatible agent to use any MCP-compatible tool server.

For instance, the Cursor directory features many MCP servers, such as Firecrawl, which turns entire websites into LLM-ready data.

The concept of agents using tools isn't new. Traditionally, tools were coded directly within the agent itself. A common pattern is the ReAct (Reasoning and Acting) prompting model. Here’s how it typically works:

User Query: A user sends a request, like "Create an integration with the Supabase SDK."
Agent Thinks: The agent analyzes the request and decides on a plan. It might think, "I need to find the latest Supabase SDK documentation."
Agent Acts: The agent calls a tool, such as a search_web tool, to get the information.
Observation: The tool returns the search results.
Agent Thinks Again: Based on the new information, the agent might decide to use another tool, like read_file, to examine the existing codebase. This think-act-observe loop continues until the task is complete.

The innovation of MCP is that the tools are no longer hard-coded into the agent. Instead, you can have multiple, independent toolkits. One toolkit might provide access to your Confluence data, while another offers web-scraping capabilities. As long as your agent is MCP-compatible, it can connect to and use any of these toolkits.

In this article, we will build an MCP server and make it available over HTTP. Furthermore, we'll mount multiple MCP servers into a single FastAPI application, so you don't need to deploy dozens of instances to host dozens of different MCP servers.

Building a Web-Searching MCP Server

While this article won't cover the entire process of creating an MCP server from scratch, we'll review a complete example that you can adapt. We'll focus on making an existing server HTTP-compatible.

Here is a server that exposes a web search tool using the Tavily API for programmatic internet searches.

# server.py
import os
from tavily import TavilyClient
from fast_mcp import McpServer, McpTool

# Initialize the Tavily client with your API key
tavily_client = TavilyClient(api_key=os.environ["TAVILY_API_KEY"])

# Initialize the MCP Server
# CRITICAL: Set transport to 'streamable-http' for remote access
mcp = McpServer(
    "web-search",
    transport="streamable-http",
)

@mcp.tool()
def web_search(query: str) -> list[dict]:
    """
    Use this tool to search the web for information using the Tavily API.

    :param query: The search query.
    :return: The search results.
    """
    return tavily_client.search(query)

# Host the server on localhost for local testing
if __name__ == "__main__":
    mcp.run(host="0.0.0.0", port=8000)

A few key points about this code: - Tool Definition: The web_search function is decorated with @mcp.tool(), turning it into a tool the agent can execute. - Function Naming: The function name (web_search) must be descriptive, as the agent uses it to decide whether to call the tool. - Docstring as Prompt: The docstring is not just for documentation; it acts as a prompt for the agent, explaining when and how to use the tool. Be explicit and clear. - Transport Layer: The most important change for remote deployment is setting transport="streamable-http". The default, stdio, is for local execution only. HTTP allows the server to be accessed from anywhere via a URL.

To run this server locally, first activate your virtual environment and then execute the script: bash uv run server.py Your server should now be running on http://0.0.0.0:8000.

Debugging Your HTTP MCP Server

Debugging an HTTP-based MCP server requires a specific approach. You can't simply send requests from a standard HTTP client like Postman because the server expects JSON-RPC messages.

The Anthropic team created the Inspector application to simplify this process.

Note: When testing, there are a few tricky details to remember.

Open the Inspector.
Change the Transport Type from the default stdio to streamable-http.
Enter the URL where your server is running. Crucially, you must append /mcp to the path. For our local server, the URL would be http://localhost:8000/mcp.

If you connect without /mcp, you will get a 404 Not Found error.

Once connected, you can test your tools. For example, running the web_search tool with the query "Who is Sam Altman?" will call the Tavily API and return the latest search results.

Integrating with a Client: VS Code Copilot

With the server running and verified, let's connect it to an agent. We'll use GitHub Copilot in VS Code.

In Copilot, select Add MCP Server.
Choose the HTTP option.
Enter the URL where your server is hosted.

Critical Tip: For some reason, the URL must end with a trailing slash (/). If you omit it, the connection will likely fail. The correct URL is http://localhost:8000/mcp/.

After adding the server (e.g., naming it tavily-search), you can test it in the chat. Ask a question that requires a web search, like "Search on the internet who is Sam Altman." Copilot should recognize the need to use your tool, and you'll see a prompt to allow the action. Once you approve, it will return the results from your server.

Hosting Multiple MCP Servers in One FastAPI App

If you have over 5+ MCP servers, deploying them in separate instances is inefficient. A better approach is to mount all of them into a single FastAPI application, with each server exposed on a different path.

Here’s how you can do it. Imagine you have two simple servers, eco_server and math_server.

# main_server.py
from contextlib import asynccontextmanager
from fastapi import FastAPI
from fast_mcp.server.lifespan import session_manager

# Import your MCP server instances
from .eco_server import mcp as eco_mcp
from .math_server import mcp as math_mcp

@asynccontextmanager
async def lifespan(app: FastAPI):
    # Run all your MCP servers concurrently
    async with session_manager.run(eco_mcp, math_mcp):
        yield

app = FastAPI(lifespan=lifespan)

# Mount each MCP server on a specific path
app.mount("/eco", eco_mcp.app)
app.mount("/math", math_mcp.app)

To run this application, you would use a command like: bash uv run fast_api_example:server

Testing Mounted Servers

Testing mounted servers in the Inspector is also a bit tricky. The URL must follow the pattern: BASE_URL/MOUNT_PATH/mcp.

To test the eco server, use: http://localhost:8000/eco/mcp
To test the math server, use: http://localhost:8000/math/mcp

Connecting to these endpoints will allow you to list and run the tools for each respective server.

Deploying to the Cloud

Now, let's deploy our multi-server application. We'll use Render, which offers a free tier suitable for learning and testing.

Prepare for Deployment: Modify your server to use the port specified by the environment variable, which is standard for most hosting providers.
```
# In your mcp.run() or uvicorn.run() call
port=int(os.environ.get("PORT", 8000))
```
Push to GitHub: Commit your code and push it to a GitHub repository.
Deploy on Render:
- Create a new Web Service on Render and connect your GitHub account.
- Select your repository.
- Set the Build Command (e.g., uv sync).
- Set the Start Command to run your FastAPI application (e.g., uv run fast_api_example:server --host 0.0.0.0).
- Add any necessary environment variables (like your TAVILY_API_KEY).
- Deploy the service.

Final Verification

Once deployed, Render will provide a public URL for your application (e.g., https://my-mcp-app.onrender.com).

You can test this remote server just like you did locally: - Inspector: Connect to https://my-mcp-app.onrender.com/eco/mcp to test the eco server. - VS Code: Add a new MCP server using the remote URL (https://my-mcp-app.onrender.com/eco/mcp/) to connect Copilot to your deployed eco tool.

After starting a new chat and testing the tool, you should see it working perfectly, running entirely from your remote server.

This covers the end-to-end process of creating, combining, and deploying remote MCP servers. As a next step, you could explore adding an authentication layer to secure your servers so that only authorized clients can connect.