Run Any LLM Locally in 5 Steps with Docker's New Model Runner |

A while back, an article was published showing how to run any open-source large language model locally, but it has been over 6 months since then, and the platform itself has become outdated. This is why today this article will showcase a new and much easier way to install and run models locally.

And no, it's not with Olama. We'll get to why in a second. It's with Docker's brand new model runner. Docker just released this, and it's a game-changer for local model deployment. In this article, we'll walk you through how it works and how it connects to a web UI so that you can interact with your models more easily.

A Modern Approach to Local AI

Docker Model Runner is a modern, local-first way to run AI models, giving you full control, zero hassle, and seamless integrations with your existing Docker workflows. It's 100% local, fully private, and OpenAI API compatible right out of the box.

The best part? It's built directly into Docker Desktop. This means you don't need to have CUDA installed or mess with GPU drivers. Just enable it, and you can easily start working with large language models. You can pull models from Docker Hub or Hugging Face with a single command and then run them from your terminal, your containers, or even integrate them directly into GenAI apps within seconds.

Why Not Just Use Olama?

You might be wondering, why not just use Olama? That's a fair question. The main reason is that Docker Model Runner is built for developers already working with Docker. If you're building and deploying apps using containers, this fits directly into your existing pipeline. It's native, scalable, and production-friendly.

Olama is excellent for getting started, but it's more limited when it comes to integrations and deployments in real-world applications. Docker Model Runner, on the other hand, makes it easy to scale, compose, and integrate into full-stack projects without changing your existing workflow. Plus, it's easier to install and work with. You'll be able to access the same number of large language models as you would with Olama, as it connects not only to Docker Hub's catalog of models but also to Hugging Face model cards.

System Requirements

There are a few system requirements for Docker Model Runner. Individual models have their own requirements based on RAM and disk space. Docker Model Runner itself works on Windows, macOS, and Linux. Ensure your system meets these prerequisites to run it locally. You will also need Docker Desktop installed. This tool is completely free to use.

Getting Started: A 5-Step Guide

1. Enable the Feature in Docker Desktop

After installing Docker Desktop, navigate to Settings > Beta Features. From there, enable the Docker Model Runner. You can also enable Host-side TCP to run it on a specific port and GPU backend inference if you have the hardware.

2. Verify the Installation

To confirm it's working, open your terminal and run the following command: bash docker model status You should see a message indicating that the Docker Model Runner is running.

3. Explore and Install Models

Next, go to the main Docker Desktop dashboard and click on the Models tab. Here, you can view all your local models. You can install models manually or use Docker Hub to easily pull and install them.

Understanding Docker's OCI-Based Model Format

An important aspect of Docker Hub is its use of a new OCI-based packaging format for models. This means models from Docker Hub are packaged in a standardized way. This unique format includes only the bare essentials: the model weights, a simple manifest, and a license file. It doesn't bundle an inference server or an API wrapper. This approach gives you full control to pair the model with the runtime of your choice, such as TGI or Llama CPP, making the process clean, modular, and production-friendly. It simplifies model management significantly.

4. Pulling a Model

Installing a model is straightforward. Click on a model card to get more information, including its variants, size, and quantization options. For example, while a model like deepseek-coder-v2 is massive, you can access different parameter sizes and install the one that fits your needs. To install a model like small-lm2, simply select the desired version (e.g., latest) and click Pull. The model will install with a single click.

5. Running the Model

Once installed, you can run the model in several ways. You can click Run to use the built-in chatbot within Docker Desktop, or you can run it from the terminal.

To see a list of your installed models, use this command: bash docker model list To run a specific model from the terminal, use: bash docker run <your-model-name> However, for simple interactions, using the Docker Desktop interface is often more convenient. For example, after starting the small-lm2 model, you can send it a message like, "Hi, how are you?" and it will respond quickly. This demonstrates how easily you can get models up and running within the Docker Desktop application.

Integrating with Open Web UI for an Enhanced Experience

For a more advanced interface, you can set up Docker models with the Open Web UI project. This provides a beautiful, user-friendly, and self-hosted web UI to interact with your models. It includes numerous extendable features, a built-in RAG inference engine, and is 100% private.

Currently, Open Web UI requires a specific configuration to work with Model Runner. A public repository provides a compose.yaml file for this purpose. By default, this file is set up to install the gemma-2-9b-it model, but you can easily change it to any model you prefer. The configuration uses the standard OpenAI API base URL that Docker Model Runner provides.

To use the provided configuration with gemma-2-9b-it, you can run the docker-compose up command directly. If you want to use a different model, you would: 1. Edit the compose.yaml file and replace the model name with your desired model card from Docker Model Runner. 2. Save the compose.yaml file locally. 3. Run docker-compose -f /path/to/your/compose.yaml up in your terminal.

After the process completes, Open Web UI will be running. You'll just need to create a local account, and then you can start interacting with your models through this powerful web interface.

That's how you can easily use this web UI with Docker Model Runner. Docker's new Model Runner is a fantastic and straightforward way to run any model locally. It is highly recommended to explore this new feature.