OpenAI's MCP Client Support Explained in 5 Minutes |

OpenAI recently introduced MCP (Model Context Protocol) client support in its Responses API. If that sounds complex, don't worry. We'll break it down in this article.

What is Model Context Protocol (MCP)?

First, let's define Model Context Protocol (MCP). A protocol is simply a set of rules for communication. In this case, it enables your Large Language Model (LLM) application to communicate with a variety of tools, resources, and responses. This provides your model with richer context, making your applications more powerful and aware.

A key development you might have missed is the introduction of remote MCP servers. You can now run a dedicated server—for instance, on a platform like Cloudflare—that your AI application can connect to. We'll even touch on how you can build one yourself later in this article.

This is significant because several clients, such as Cursor or Anthropic's Claude, can now connect to these servers using the MCP protocol. This gives them access to a suite of tools, allowing them to perform tool calls and understand natural language commands to execute complex tasks.

The protocol also supports prompts, resources, and sampling, but we won't delve into those details today. Instead, let's jump into a practical demonstration to see this in action.

A Practical Demo: The "Whoa" Diary

Often, when experimenting with AI, I have a 'whoa' moment—a surprising and delightful experience I wish I could log. To solve this, I built a simple OpenAI chat application connected to an MCP server. Every time I type 'whoa' in the chat, the app will use a tool to track that moment, creating a 'whoa' diary.

Let's start with a simple prompt in our chat app:

User: Can you speak pig Latin?

AI: An-cay I-pay eak-spay Ig-pay Atin-lay.

It works perfectly! That's quite impressive. My natural reaction is to say:

User: Whoa, that's rad that you can do that.

By typing 'whoa', I triggered a tool call to my MCP server. The AI responded:

AI: Caught you by surprise with my Pig Latin skills, you did.

Behind the scenes, a tool message was sent. When you make a tool call, OpenAI returns the arguments and the name of the function that was called. In this case, the function was trackWhoa. The server received the context—that the assistant spoke in Pig Latin—and logged it. The output from my function was then returned.

My MCP server is connected to a database where this information is stored. Now, I can ask:

User: How many times have I said whoa?

AI: You have said 'whoa' over 7+ times.

The system uses the context that I am the logged-in user. This demonstrates how the tool's output is seamlessly integrated back into the chat, creating a personalized and context-aware experience. This is a powerful feature for developers looking to build custom applications.

Under the Hood: The Code

Let's examine the code that makes this possible. We are using OpenAI's new Responses API, not the standard Completions API. If you haven't explored the Responses API yet, it's highly recommended, as it unlocks numerous advanced features you might have overlooked.

You pass the user's message as input, along with the previous_response_id. This allows the API to maintain the conversational thread. The response object that comes back is quite rich with information.

In this example, I'm using gpt-4o. The key part is the tools parameter. While you might be familiar with function calling, there's now a new tool type: mcp. Here's a simplified look at the code:

const response = await openai.responses.create({
  model: 'gpt-4o',
  input: { user_message: '...' },
  previous_response_id: '...',
  tools: [
    {
      type: 'mcp',
      mcp: {
        server: 'https://whoa.example.com',
        protocol: 'sse',
        headers: {
          'X-Username': 'craigsdennis'
        }
      }
    }
  ]
});

My MCP server is hosted on Cloudflare, and I'm using Server-Sent Events (SSE) for communication. Notice the headers object. This is how I pass authentication information, like a username or even a bearer token. This server-side approach is incredibly powerful because it allows for secure, authenticated, and user-specific tool usage. You can create customized toolsets and manage authorization for different users.

The API also includes a require_approval flag for a human-in-the-loop workflow. I've set it to never because I want to automatically track every 'whoa' moment without a confirmation prompt.

The MCP Server Logic

Now, let's look at the MCP server code. The server receives a fetch request and extracts the X-Username from the headers. If the username is missing, access is denied. This is where you would also validate a bearer token or other authentication credentials.

// Simplified server-side logic
export default {
  async fetch(request, env, ctx) {
    const username = request.headers.get('X-Username');
    if (!username) {
      return new Response('Unauthorized', { status: 401 });
    }
    // ... rest of the server logic
  }
};

I then set the username in the context properties, making it available throughout the server's logic. To get started quickly, there are templates available to build a remote MCP server. With a single click, you can deploy a boilerplate to Cloudflare, which sets up the Git repository and local environment for you. It's a seamless process that simplifies development.

Once the server is set up, you primarily work within the init function to define your tools. I have access to the props.username that was passed through earlier.

To define a tool, you provide a name and a schema for its expected arguments. The more descriptive your schema, the more effectively the LLM can use the tool. This is a key difference from a traditional API; you are essentially creating a natural language front-end for your functions. You don't expose all your API endpoints, only the ones that are useful for the LLM to call.

The tool call also supports filtering, so you can specify that you only want to use a subset of the tools available on an MCP server, even if it hosts numerous tools.

Here's the implementation for the trackWhoa tool. I define the schema and then the action to take. The reason (the summary of why I said 'whoa') is passed in as a nicely typed argument. I then insert this reason, along with the username from the props, into a D1 database.

// Tool definition on the MCP server
const serverTools = {
  trackWhoa: {
    schema: {
      type: 'object',
      properties: {
        reason: { type: 'string', description: 'The reason for the \\'whoa\\' moment.' }
      },
      required: ['reason']
    },
    async handler({ reason }, { props }) {
      await env.DB.prepare('INSERT INTO whoas (username, reason) VALUES (?, ?)')
        .bind(props.username, reason)
        .run();
      return 'Whoa moment tracked!';
    }
  }
};

The value returned from this handler is what gets sent back to the LLM. It's a powerful and elegant way to extend AI capabilities.

It's time for you to dive into this technology. There are numerous examples available to get you started. This new feature opens up a world of possibilities for creating more dynamic and intelligent applications.