WebMCP: The Future of AI-Powered Web Interaction |

Translate: 🇫🇷 French 🇸🇦 Arabic 🇨🇳 Chinese 🇪🇸 Spanish

There’s a new proposal backed by Google and Microsoft. It could shape the future of how we use the web.

And I like it.

It’s called WebMCP. But don’t confuse it with a normal MCP server.

Instead, WebMCP is a browser API. It will let front-end developers expose features of their sites as tools to AI agents. Essentially, it lets every site become a mini MCP server.

While some sites have already launched their own MCP servers, this is different. Its goal is to help agents use the website for you. Not just access your APIs and show results in a chat.

It’s entirely front-end based.

If that sounds a bit confusing, let’s jump into a demo.

A Glimpse into the Future

I’ll admit, this demo won’t look too exciting at first. But that’s the point of WebMCP. It’s taking something that’s already possible and just making it way better.

Here, I have the Canary version of Chrome where this proposal is being tested. And a site set up with some WebMCP tools.

Imagine an extension on the right is your browser’s built-in AI. Whether that’s Gemini, or whatever else the future holds.

I’ll give it a prompt: "I want to book a roundtrip flight for two people from London to New York on specific dates."

When I hit send, it takes me directly to the search result page. It used the website for me.

Crazy stuff, right? The key thing isn’t what it did, but how it did it.

The Old Way vs. The WebMCP Way

The current approach for AI using websites is clunky. It involves tools like Playwright, HTML parsing, or even taking screenshots. The AI tries to use the site like a human.

But all of that is inefficient, especially token-wise. And it’s prone to a lot of errors.

This is what WebMCP is here to fix.

WebMCP lets the website developer expose specific MCP tools. These tools then interact with the client-side JavaScript.

When an AI uses a WebMCP tool, it’s simply running a JavaScript function on your site. A function that you, the developer, have defined.

graph TD
    A[User Prompt: "Book a flight..."] --> B{Browser AI};
    B --> C{Finds `searchFlights` WebMCP Tool};
    C --> D[Calls Tool with Arguments<br>origin: London, dest: NY];
    D --> E[Executes Pre-defined JS Function];
    E --> F[Updates React State];
    F --> G[Navigates to Search Results Page];

On this demo flight page, there’s one tool available: searchFlights. It takes input arguments like origin, destination, and tripType. A perfect one-to-one match with the form on the page.

The AI knows it can use this tool. It doesn’t fill in the form with Playwright or parse any HTML. It doesn’t need to know what the website looks like at all.

It simply calls the WebMCP tool with the input arguments. The developer has already decided what happens next. In this case, a JavaScript function updates the React state, causing a navigation.

The Imperative API: Code in Action

Let’s look at the front-end code. It’s incredibly simple.

First, we register the WebMCP tools for the page. We do that using window.navigator.mmodelContext. This is the new API that would be built into browsers.

// Register the tools available for the page
if (window.navigator.modelContext) {
  const { modelContext } = window.navigator;
  modelContext.registerTool(searchFlightsTool);
}

Once we have our modelContext API, we use registerTool(). Here, I’m registering the searchFlights tool we saw earlier.

An actual tool is just a simple object definition.

It has a name.
It has a description, so the AI knows when to use it.
And an inputSchema if it takes arguments.

// Definition for the searchFlightsTool
const searchFlightsTool = {
  name: 'searchFlights',
  description: 'Finds and searches for available flights.',
  inputSchema: {
    type: 'object',
    properties: {
      origin: { type: 'string', description: 'The departure airport code.' },
      destination: { type: 'string', description: 'The arrival airport code.' },
      // ... other properties
    },
  },
  execute: searchFlights, // The JS function to run
};

The most important part is the execute function. This is the client-side JavaScript that runs when the tool is used.

In my case, I’m taking the parameters from the AI. And I’m dispatching a custom event called searchFlights.

// The 'execute' function dispatches a custom event
function searchFlights(params) {
  const event = new CustomEvent('searchFlights', { detail: params });
  window.dispatchEvent(event);
}

Then, in my React code, I just add an event listener. When that event fires, I run a function to handle the search. This is where I can do anything I want in React. Here, I’m just setting search parameters, which triggers the navigation.

It really is that simple.

[!TIP] This approach is incredibly token-efficient. It also allows the developer to define the exact interactions, creating guardrails for the AI.

It’s a neat solution for building sites with both a human and an AI assistant in mind.

More Than Just Forms

WebMCP tools aren’t just for navigation or filling forms. They’re also great for passing information from the page to the AI.

Say I, as a human, adjust some filters.

Price less than $500.
Departure time before midday.

There are still a lot of flights. I want the AI to help me choose the best one. So I can ask, "What flight would you recommend on this page?"

Current approaches would scrape the entire page. But with WebMCP, we don’t need to do that.

As the developer, I set up a tool called listFlights. This tool has access to the current React state. It can see all the flight information in a nice JSON format.

When I ask the AI for a recommendation, it calls that tool. It gets a clean list of the currently displayed flights. And it gives me a recommendation for Flight 56.

That process used way fewer tokens and is far more accurate.

The Declarative API: Zero JavaScript

Now for the final showcase. How to use WebMCP with no JavaScript.

So far, we’ve used the imperative API, where the developer writes the JavaScript. There’s also a second approach: the declarative API.

This is much simpler. It’s meant for the simple use case of filling in HTML forms.

I have a simple booking reservation form. I can ask my AI to book a table with the necessary information. It goes ahead and fills that form in for me.

It has access to a WebMCP tool called bookTable. But I wrote no JavaScript to create this tool.

The declarative API works by adding attributes to your HTML form.

tool-name="bookTable"
tool-description="Books a table at the restaurant."

The browser then converts that form into a WebMCP tool for you. It tries to understand what each input should be for the tool’s arguments.

<form
  id="booking-form"
  tool-name="bookTable"
  tool-description="Books a table at the restaurant."
>
  <label for="name">Name:</label>
  <input type="text" id="name" name="name" required />

  <label for="phone">Phone:</label>
  <input
    type="tel"
    id="phone"
    name="phone"
    tool-param-description="Include country code"
    required
  />
  
  <!-- ... other form inputs ... -->

  <button type="submit">Book Table</button>
</form>

The only other difference is the tool-param-description attribute on some inputs. This gives the AI a bit more context on how to fill in the information.

The browser handles the rest. It picks up the input name, type, and other attributes to create the tool. All of that, with zero JavaScript.

Final Thoughts

That’s pretty much all there is to the WebMCP proposal for now. And I’m pretty positive about it.

It bridges the gap between web apps and AI agents. It removes the guesswork when agents try to use a site. It ensures interactions are defined explicitly by the website’s developers.

I prefer having an AI help me out on the actual website. It’s a much better system for keeping a human in the loop. And it allows developers to define that experience.

But it’s worth remembering this is just a proposal. It might take some time to appear in browsers.

And there are still limitations to address. The classic one: security.

There could be poison tools and malicious descriptions on some websites. How much access should the AI have? How much control will the browser AI have over the entire browser? If a poison tool goes out of control, how much damage can it do?

Hopefully, they find an answer for that. Because I’m pretty positive on this proposal.