Running Local LLMs with OpenClaw (formerly Maltbot/Clawdbot) |

Translate: 🇫🇷 French 🇸🇦 Arabic 🇨🇳 Chinese 🇪🇸 Spanish

Note: The tool mentioned in this article has gone through several rebrands, from Clawdbot to Moltbot, and now to its current name, OpenClaw. This article has been updated to reflect the new name. For the latest information, please visit the official website: openclaw.ai.

Exploring new technology should be about experimentation and enjoyment. This article dives into a highly requested topic: running a local model with OpenClaw. While this publication has covered setting up local servers with tools like Olama and LM Studio before, this guide will focus specifically on integrating them with OpenClaw for a powerful, private AI experience.

Choosing Your Local Server: Olama vs. LM Studio

When setting up a local large language model (LLM), the first choice is the server. Personally, I recommend Olama for a couple of key reasons.

Olama offers remarkable flexibility by providing two types of models. First, you have models that are 100% local and run entirely on your machine, such as GLM 4.7 Flash, which we will use in this guide. Second, it provides access to cloud-based open-source models. This hybrid approach gives you a wide range of options. In contrast, LM Studio is strictly for 100% local models.

The setup process for both is nearly identical; the primary difference lies in the port numbers used to communicate with the local model.

Setting Up Olama

With our choice made, let’s get Olama configured. You can either download the client directly from the official website (the recommended approach) or install it using Homebrew.

Once installed, open your terminal and pull the model you wish to run.

ollama pull glm:4.7-flash

This command downloads the local model to your machine. The instructions also mention the possibility of downloading a cloud version, which might be more powerful and have a full context length.

A critical step: In the Olama application, you must increase the context length. By default, it’s set to a low value like 4,000 tokens. Max it out to ensure you don’t run into limitations.

The creator of OpenClaw, Peter Strinberger, recommends the Miniaax M2 model, which is available on the cloud. If you prefer a completely local alternative and are using LM Studio, you can find versions like Miniax 8bit or 6bit.

To test the local model, you can run it directly in your terminal.

ollama run glm:4.7-flash "Tell me a joke"

You’ll notice your machine’s fans spin up as the GPU engages. This is all happening 100% locally, ensuring your prompts and data remain private.

To confirm the server is running correctly, navigate to http://localhost:11434 in your browser. You should see a message confirming “Olama is running.”

Configuring OpenClaw for Local Models

Now for the magic. In your OpenClaw config.json file, you need to define the connection to your local LLM server.

Here is a sample configuration snippet. Note that if you were using Local LM Studio, the default port would be 1234.

{
  "llm": {
    "base_url": "http://localhost:11434",
    "api": "openai.completions",
    "models": {
      "glm:4.7-flash": {
        "reasoning": false
      }
    }
  },
  "agents": {
    "models": {
      "default": "ollama/glm:4.7-flash",
      "local": "ollama/glm:4.7-flash"
    }
  }
}

In this configuration:

base_url: Points to your running Olama instance.
api: Specifies the API format to use.
models: Lists the available models you want OpenClaw to recognize.
agents.models: Defines aliases. Here, local is an alias for our Olama model, making it easy to reference.

With Olama running, the hardest part is over. You can now simply ask OpenClaw to handle the integration for you by instructing it to link to the Olama model you’re using.

Switching Models on the Fly

Inside the OpenClaw chat window, you can easily switch between models. To use your local model, type:

model glm:4.7-flash

OpenClaw will confirm the switch. If you want to revert to a faster, cloud-based model like Haiku, you can do so just as easily:

model haiku

Performance Considerations

When you switch back to a local model like GLM, you will notice a significant slowdown in response time. A request for a 500-word story that a cloud model like Haiku would handle almost instantly will show a noticeable delay with a local model.

This means local models are best suited for proactive, non-urgent tasks. Think of daily briefings, report generation, or other background processes where instant feedback isn’t critical. For anything requiring a quick, interactive response, it’s best to stick with a lighter model like Haiku. My recommendation is to leave a fast model as your default for general use.

Automating Model Selection with Skills and Cron Jobs

The real power of OpenClaw comes from its automation capabilities. Everything you configure, from skills to cron jobs, can be assigned a specific model.

For example, if you have a daily assessment cron job, you can instruct OpenClaw to use the local GLM model for that specific task, even if your default is Haiku.

In my daily assessment cron job, use the glm:4.7-flash model.

OpenClaw will confirm the change. Because this task runs in the background, the slower processing time is perfectly acceptable.

A Note on Intent-Based Model Switching

Currently, there isn’t an automatic way for OpenClaw to select a model based on the user’s intent. You could theoretically build a skill that first assesses the incoming request to determine if it needs a “heavy-duty” model (like Opus or a local LLM) or a “light” one (like Haiku).

However, this approach is self-defeating. The initial assessment would itself require an AI request, adding latency. By the time the intent is determined, you might as well have used the faster model to begin with.

Therefore, the most practical strategy is:

Set a light, fast, and cheap model as your default for general queries.
Configure specific skills, cron jobs, and other automated tasks to use your powerful local models on a case-by-case basis.

This approach gives you the best of both worlds: speed for interactive chats and the power and privacy of local LLMs for your automated workflows. Happy experimenting!