Run Any LLM Locally in 5 Steps with Docker's New Model Runner
A while back, an article was published showing how to run any open-source large language model locally, but it has been over 6 months since then, and the platform itself has become outdated. This is why today this article will showcase a new and much easier way to install and run models locally.
And no, it's not with Olama. We'll get to why in a second. It's with Docker's brand new model runner. Docker just released this, and it's a game-changer for local model deployment. In this article, we'll walk you through how it works and how it connects to a web UI so that you can interact with your models more easily.
A Modern Approach to Local AI
Docker Model Runner is a modern, local-first way to run AI models, giving you full control, zero hassle, and seamless integrations with your existing Docker workflows. It's 100% local, fully private, and OpenAI API compatible right out of the box.
The best part? It's built directly into Docker Desktop. This means you don't need to have CUDA installed or mess with GPU drivers. Just enable it, and you can easily start working with large language models. You can pull models from Docker Hub or Hugging Face with a single command and then run them from your terminal, your containers, or even integrate them directly into GenAI apps within seconds.
Why Not Just Use Olama?
You might be wondering, why not just use Olama? That's a fair question. The main reason is that Docker Model Runner is built for developers already working with Docker. If you're building and deploying apps using containers, this fits directly into your existing pipeline. It's native, scalable, and production-friendly.
Olama is excellent for getting started, but it's more limited when it comes to integrations and deployments in real-world applications. Docker Model Runner, on the other hand, makes it easy to scale, compose, and integrate into full-stack projects without changing your existing workflow. Plus, it's easier to install and work with. You'll be able to access the same number of large language models as you would with Olama, as it connects not only to Docker Hub's catalog of models but also to Hugging Face model cards.
System Requirements
There are a few system requirements for Docker Model Runner. Individual models have their own requirements based on RAM and disk space. Docker Model Runner itself works on Windows, macOS, and Linux. Ensure your system meets these prerequisites to run it locally. You will also need Docker Desktop installed. This tool is completely free to use.
Getting Started: A 5-Step Guide
1. Enable the Feature in Docker Desktop
After installing Docker Desktop, navigate to Settings > Beta Features. From there, enable the Docker Model Runner. You can also enable Host-side TCP to run it on a specific port and GPU backend inference if you have the hardware.
2. Verify the Installation
To confirm it's working, open your terminal and run the following command:
bash
docker model status
You should see a message indicating that the Docker Model Runner is running.
3. Explore and Install Models
Next, go to the main Docker Desktop dashboard and click on the Models tab. Here, you can view all your local models. You can install models manually or use Docker Hub to easily pull and install them.
Understanding Docker's OCI-Based Model Format
An important aspect of Docker Hub is its use of a new OCI-based packaging format for models. This means models from Docker Hub are packaged in a standardized way. This unique format includes only the bare essentials: the model weights, a simple manifest, and a license file. It doesn't bundle an inference server or an API wrapper. This approach gives you full control to pair the model with the runtime of your choice, such as TGI or Llama CPP, making the process clean, modular, and production-friendly. It simplifies model management significantly.
4. Pulling a Model
Installing a model is straightforward. Click on a model card to get more information, including its variants, size, and quantization options. For example, while a model like deepseek-coder-v2
is massive, you can access different parameter sizes and install the one that fits your needs. To install a model like small-lm2
, simply select the desired version (e.g., latest
) and click Pull. The model will install with a single click.
5. Running the Model
Once installed, you can run the model in several ways. You can click Run to use the built-in chatbot within Docker Desktop, or you can run it from the terminal.
To see a list of your installed models, use this command:
bash
docker model list
To run a specific model from the terminal, use:
bash
docker run <your-model-name>
However, for simple interactions, using the Docker Desktop interface is often more convenient. For example, after starting the small-lm2
model, you can send it a message like, "Hi, how are you?" and it will respond quickly. This demonstrates how easily you can get models up and running within the Docker Desktop application.
Integrating with Open Web UI for an Enhanced Experience
For a more advanced interface, you can set up Docker models with the Open Web UI project. This provides a beautiful, user-friendly, and self-hosted web UI to interact with your models. It includes numerous extendable features, a built-in RAG inference engine, and is 100% private.
Currently, Open Web UI requires a specific configuration to work with Model Runner. A public repository provides a compose.yaml
file for this purpose. By default, this file is set up to install the gemma-2-9b-it
model, but you can easily change it to any model you prefer. The configuration uses the standard OpenAI API base URL that Docker Model Runner provides.
To use the provided configuration with gemma-2-9b-it
, you can run the docker-compose up
command directly. If you want to use a different model, you would:
1. Edit the compose.yaml
file and replace the model name with your desired model card from Docker Model Runner.
2. Save the compose.yaml
file locally.
3. Run docker-compose -f /path/to/your/compose.yaml up
in your terminal.
After the process completes, Open Web UI will be running. You'll just need to create a local account, and then you can start interacting with your models through this powerful web interface.
That's how you can easily use this web UI with Docker Model Runner. Docker's new Model Runner is a fantastic and straightforward way to run any model locally. It is highly recommended to explore this new feature.
Join the 10xdev Community
Subscribe and get 8+ free PDFs that contain detailed roadmaps with recommended learning periods for each programming language or field, along with links to free resources such as books, YouTube tutorials, and courses with certificates.