WebMCP: A New Way for AI to Interact with Websites |

Translate: 🇫🇷 French 🇸🇦 Arabic 🇨🇳 Chinese 🇪🇸 Spanish

Today, we’re diving into WebMCP.

It’s a brand-new specification that’s changing how we think about website interaction.

We’ll explain what it is, show a demo, and share a few hot takes.

Essentially, WebMCP is about surfacing tools and interaction methods directly through your website itself.

This is different from MCP servers, MCP UI, or MCP apps. It’s a new way to expose functionality without needing additional servers.

Let’s get into it.

The Problem: AI Interacting with Web Apps

Imagine a simple grocery list application.

It’s a basic Kanban board where you have columns for different grocery stores. Under each store, you can add, remove, and check off items.

Building an app like this involves standard features:

Add a store
Add an item
Reorder items
Rename an item
Delete an item
Check off an item

Now, what if you want an AI to use your application’s features?

You have a few options.

Build AI directly into your app. You foot the bill for the AI usage.
Create an MCP server. This exposes your tools, and an AI chat can communicate with it.
Use MCP UI (or MCP apps). This allows you to embed stylized components directly into a chat.

WebMCP offers a fourth, different path.

What is WebMCP?

WebMCP says, “I have a website, and here are the things it can do.”

If an AI were to programmatically open a browser to use your site, it traditionally relies on tools like Playwright.

Playwright might dump the HTML, analyze the accessibility tree, or take screenshots. It then tries to use the website like a human, clicking buttons and filling forms.

In my experience, this is decent but extremely slow. The AI has to parse everything, figure out which buttons to press, and so on.

The solution?

Your website can explicitly publish the actions it supports.

A Practical Example: The Grocery App

In my grocery list app, I have several tools defined:

addItem
getItems
deleteItem
deleteStore
createStore
getItemsByStore
moveItem

This isn’t so different from a standard MCP server. You publish the tools and define their input and output schemas.

You tell the AI:

“Here’s a tool called addItem. Here’s what it does. To use it, you must provide the item name and the store. I will return the following information.”

It’s a structured way for an AI to communicate with your application.

This approach is a better fit than building a custom API from scratch, especially for existing web applications. It allows any of the millions of existing web apps to become AI-ready.

You can declaratively or imperatively publish the tools your website supports, right in the HTML or JavaScript.

Live Demo

Using a simple Chrome extension, I can load my grocery app page. The extension immediately finds all the available tools.

I can then give it a prompt:

“Please add bananas to the Costco shopping list. Once that’s done, show me everything on my grocery list. Oh, and can you then move bananas from Costco to Whole Foods?”

The AI parses the request and uses the available tools. It adds bananas to Costco, then moves them to Whole Foods.

I can try something more complex:

“Please add all items for chicken noodle soup to Whole Foods.”

Boom. It adds chicken broth, chicken breast, egg noodles, carrots, celery, and onion.

The AI’s ability to handle typos is a huge plus. I can misspell “chicken,” and it still understands what I mean.

How WebMCP Works

So, how does the browser know what tools are available?

There are two ways to declare them.

1. Imperative Declaration (JavaScript)

You can use JavaScript to register tools directly.

window.navigator.mmodelcontext.registerTool({
  name: "addItem",
  description: "Adds a new item to a specified grocery store list.",
  inputSchema: {
    type: "object",
    properties: {
      item: { type: "string", description: "The item to add." },
      store: { type: "string", description: "The store to add the item to." }
    },
    required: ["item", "store"]
  },
  outputSchema: {
    // Define what the tool returns
  }
});

2. Declarative Declaration (HTML)

This is where it gets genius. You can declare tools using simple HTML form elements.

<form>
  <input type="text" name="item"
         tool-name="addItem"
         tool-description="Adds an item to a grocery list."
         tool-param-title="Item Name"
         tool-param-description="The name of the grocery item.">
  <input type="text" name="store"
         tool-param-title="Store Name"
         tool-param-description="The store where the item should be added.">
  <button type="submit">Add Item</button>
</form>

The browser parses the HTML and infers the schema directly from your form. No extra schema definition is needed. It’s a beautiful meeting of worlds.

Here’s a visual breakdown of the flow:

graph TD
    A[User issues a natural language prompt] --> B{AI Model};
    B --> C{Does the current website support WebMCP?};
    C -- Yes --> D[Discover available tools via JS/HTML];
    C -- No --> E[Fall back to slower methods like screen scraping];
    D --> F[AI selects the appropriate tool(s)];
    F --> G[AI executes the tool with required parameters];
    G --> H[Website performs the action];
    H --> I[Result is returned to the user/AI];

Key Benefits of WebMCP

A Mixed UI Approach

Not every interaction needs to be a UI widget embedded in a chat. Sometimes, users want to visit the actual website for the full experience, upselling, and UI. WebMCP lets users interact with the familiar website using powerful natural language.

It’s much faster to type, “Add items for chicken noodle soup,” than to manually add six different items.

Speed and Efficiency

This is way faster than AI browsers that rely on inference. Those tools are often painfully slow.

With WebMCP, an action can take just a few seconds.

“Add a new store, a drugstore. Add lip balm.”

Five seconds later, it’s done. A new store is created, and the item is added.

Token Efficiency

You’re not sending the entire DOM tree or a screenshot to the AI. You’re only sending the tool definitions and possible options. This is much more token-efficient and cost-effective.

Framework Integration

This seems ripe for frameworks to implement. Frameworks already have your schemas, validation, and UI layers. Taking the final step to publish these as WebMCP tools would be incredibly easy. It eliminates the need to spin up and host a separate MCP server.

Lingering Questions

Since this is a very new specification, some questions remain.

Will cross-app interactions be possible? Users want to chain commands across different services. For example: “Look at my calendar for dinner plans this week, then add the ingredients to my grocery list.” I assume this will be possible, with a central chat app visiting multiple sites to discover and use their tools.
Will it be headless? I assume at some point, this will run in a headless browser for seamless background operations.

Final Take: Adapting the Web for AI

I think this is a great way for the web to adapt to AI.

It’s a practical bridge for developers. You don’t need everyone to publish a dedicated MCP server. Just like with responsive design, you can make a few changes to your existing site, and it’s ready for a new paradigm.

However, the history of web APIs offers a cautionary tale.

[!NOTE] Early on, almost every website had a free, open API. We saw hundreds of Twitter clients and mashups. Over time, companies locked them down to control the user experience and monetize their platforms. Reddit, Twitter, and Instagram APIs are now either gone or prohibitively expensive.

Big companies want you on their platform, using it their way. They may not be eager to let their service be used as a simple utility.

So, while WebMCP might not be the final endgame, it’s a fantastic step forward for developers who do want AI to interact with their applications.