Monday, June 15, 2026

Adding a Web UI to Your Local LLM - Open WebUI, Docker & GPU Acceleration | EngineerVisions

Adding a Web UI to Your Local LLM - Open WebUI, Docker & GPU Acceleration | EngineerVisions

Adding a Web UI to Your Local LLM — Open WebUI, Docker & GPU Acceleration

📌 Part 2 of the Local LLM Series In the previous post, I covered how to install Ollama on Linux Mint and run Phi-4 Mini and Gemma3:4B locally on an MSI Katana with an RTX 3050. This post picks up where that left off - I briefly mentioned Open WebUI as a web interface but never went into depth about how to properly set it up, especially when things go wrong with Docker networking and GPU access. That is exactly what this post covers.

The terminal chat that Ollama provides works perfectly well for quick experiments, but for longer sessions - especially when you are comparing model outputs or picking up a conversation later - a proper browser-based interface makes a noticeable difference. Open WebUI is a self-hosted interface that looks and feels very much like ChatGPT or Claude, and it connects directly to your local Ollama instance. No data leaves your machine.

What I did not expect was the number of small but meaningful hurdles involved in getting it running correctly - particularly around Docker networking and getting the GPU accessible from inside the container. This post documents every step and every error I hit along the way.

Installation Options

Open WebUI supports several installation methods: Docker, pip, uv, and a standalone desktop app. I tried both Docker and the desktop app. They are functionally nearly identical, so I uninstalled the desktop app and stuck with Docker - it is easier to manage, restart, and update in place. Everything from here on assumes Docker.

📦
Available install methods Docker (recommended for most users), pip, uv, and a native desktop app. Full documentation at docs.openwebui.com.

Basic Docker Setup

Start by pulling the main image and running the container. The image is lightweight and the setup is a single command:

Terminal
$ docker pull ghcr.io/open-webui/open-webui:main
Terminal
$ docker run -d \
  -p 3000:8080 \
  -v open-webui:/app/backend/data \
  --add-host=host.docker.internal:host-gateway \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

# Open: http://localhost:3000

This starts the container named open-webui, binding port 3000 on your host to port 8080 inside the container. Open a browser tab and navigate to localhost:3000 - you should see the Open WebUI login screen.

You will notice, however, that no Ollama models appear in the model selector. The interface loads but cannot reach your local Ollama instance. This is a networking problem, and it took me a while to track down exactly why.

The Networking Problem

The root cause is straightforward once you see it. By default, Ollama binds only to 127.0.0.1 - meaning it only accepts connections from the same machine it is running on. You can confirm this with:

Terminal
$ ss -tulpn | grep 11434
tcp   LISTEN 0      4096       127.0.0.1:11434      0.0.0.0:*

The Docker container has its own isolated network namespace - its localhost is not the host machine's localhost. Even though I had passed --add-host=host.docker.internal:host-gateway when starting the container, that flag only creates a DNS alias pointing at the host machine. Ollama is still only listening on 127.0.0.1, so when Open WebUI inside the container tries to reach http://host.docker.internal:11434, Ollama ignores the connection. The architecture looks like this:

Network Diagram
            Host machine (Linux Mint)

                Ollama
                  |
                  |
            127.0.0.1:11434
                  |
                  X  ← Docker cannot reach this
                  |
        -------------------------
        | Docker container      |
        | Open WebUI            |
        |                       |
        | localhost != host     |
        -------------------------

The fix is to make Ollama listen on the Docker bridge interface instead, so that containers on the default Docker network can actually reach it.

Fixing the Connection - Binding Ollama to the Docker Bridge

First, find the IP address of the Docker bridge interface on your host:

Terminal
$ ip addr show docker0
4: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether de:09:f4:3c:a7:23 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever

The Docker bridge IP here is 172.17.0.1. Now edit the Ollama systemd service to tell it to listen on that address:

Terminal
$ sudo systemctl edit ollama

This opens a drop-in override file in your editor. Add the following between the two comment markers:

/etc/systemd/system/ollama.service.d/override.conf
### Editing /etc/systemd/system/ollama.service.d/override.conf
### Anything between here and the comment below will become the contents of the drop-in file

[Service]
Environment="OLLAMA_HOST=172.17.0.1"

### Edits below this comment will be discarded

Save and exit. You should see a confirmation message: Successfully installed edited file '/etc/systemd/system/ollama.service.d/override.conf'. Now reload the daemon and restart both Ollama and the Open WebUI container:

Terminal
$ sudo systemctl daemon-reload
$ sudo systemctl restart ollama
$ docker restart open-webui

Refresh localhost:3000 in your browser. Your locally installed Ollama models should now appear in the model selector and one will be pre-selected by default.

Running Open WebUI in GPU Mode

The standard :main image runs the Open WebUI frontend and backend on CPU. If you want the WebUI's own AI features - things like document embeddings, built-in image generation pipelines, and local model inference directly through the container - to use your GPU, you need the CUDA variant.

Pulling the CUDA Image

Terminal
$ docker pull ghcr.io/open-webui/open-webui:v0.9.6-cuda

Fair warning: this image is large. It ships with full CUDA runtime libraries bundled inside.

ImageContent SizeDisk Usage
open-webui:v0.9.6-cuda5.63 GB16.9 GB

Once pulled, try running it with GPU access:

Terminal
$ docker run -d \
  -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  --gpus all \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:v0.9.6-cuda

If everything is configured correctly, the container starts silently and you can open your browser. However, on a fresh Linux system, you will likely hit this error:

Docker Error
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]]

Run 'docker run --help' for more information

This means Docker cannot find the NVIDIA Container Toolkit. Your NVIDIA driver is working fine - but there is a separate middleware layer called nvidia-container-toolkit that lets Docker communicate with the GPU. It is not installed by default even when your driver is.

Installing the NVIDIA Container Toolkit

First, confirm your NVIDIA driver is still healthy:

Terminal
$ nvidia-smi

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 595.71.05              Driver Version: 595.71.05      CUDA Version: 13.2     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3050 ...    Off |   00000000:01:00.0 Off |                  N/A |
| N/A   52C    P8              7W /   60W |     196MiB /   4096MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            1277      G   /usr/lib/xorg/Xorg                        4MiB |
|    0   N/A  N/A           19922      C   /usr/local/bin/python3                  174MiB |
+-----------------------------------------------------------------------------------------+

Good. Now install the toolkit - this adds the NVIDIA repository, installs the package, and configures the Docker runtime to use it:

Terminal
# Add the NVIDIA Container Toolkit GPG key and repository
$ curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
  sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

$ curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

$ sudo apt update
$ sudo apt install -y nvidia-container-toolkit

Once installed, configure Docker to use the NVIDIA runtime and restart the Docker daemon:

Terminal
$ sudo nvidia-ctk runtime configure --runtime=docker
$ sudo systemctl restart docker

Verifying GPU Access Inside Docker

Before re-running the Open WebUI container, verify that Docker can now see the GPU. This command pulls a minimal CUDA image and runs nvidia-smi inside it - if the output matches what you see on the host, everything is wired up correctly:

Terminal
$ docker run --rm --gpus all nvidia/cuda:12.9.2-base-ubuntu22.04 nvidia-smi
Unable to find image 'nvidia/cuda:12.9.2-base-ubuntu22.04' locally
12.9.2-base-ubuntu22.04: Pulling from nvidia/cuda
1dca0c9bc5f7: Pull complete
504205c74aa4: Pull complete
33b2e09f3b0d: Pull complete
95b418228d05: Pull complete
12eeb868872b: Pull complete
40d16f30db40: Pull complete
4f4fb700ef54: Pull complete
Digest: sha256:8cd34c18c70fcb862f9829e7a2a04597feeb5f5d221904c77610b60c78c00ba4
Status: Downloaded newer image for nvidia/cuda:12.9.2-base-ubuntu22.04
Sun Jun 14 15:22:37 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 595.71.05              Driver Version: 595.71.05      CUDA Version: 13.2     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3050 ...    Off |   00000000:01:00.0 Off |                  N/A |
| N/A   46C    P0             12W /   60W |      14MiB /   4096MiB |      9%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

The GPU is visible from inside the container. Now remove the old errored-out container and start the CUDA-enabled one properly:

Running the GPU-Enabled Container

Terminal
$ docker run -d \
  -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  --gpus all \
  -v open-webui:/app/backend/data \
  -e HF_TOKEN=<YOUR_HUGGING_FACE_TOKEN> \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:v0.9.6-cuda
💡
Add a Hugging Face token Passing -e HF_TOKEN=<token> with a read-only Hugging Face token speeds up the initial startup significantly. Open WebUI downloads sentence-transformer embedding models from Hugging Face on first boot, and an authenticated request avoids rate limiting. Get a free read token at huggingface.co/settings/tokens.

You can tail the container logs to watch the startup sequence. The key lines to look for are CUDA enabled and the server start confirmation:

Terminal - docker logs -f open-webui
No WEBUI_SECRET_KEY environment variable set, loading from file.
Generating new WEBUI_SECRET_KEY...
Loading WEBUI_SECRET_KEY from .webui_secret_key
CUDA enabled — extending LD_LIBRARY_PATH for torch/cudnn libraries.
INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
WARNI [open_webui.env] 

WARNING: CORS_ALLOW_ORIGIN IS SET TO '*' - NOT RECOMMENDED FOR PRODUCTION DEPLOYMENTS.

WARNI [langchain_community.utils.user_agent] USER_AGENT environment variable not set, consider setting it to identify your requests.

 ██████╗ ██████╗ ███████╗███╗   ██╗    ██╗    ██╗███████╗██████╗ ██╗   ██╗██╗
██╔═══██╗██╔══██╗██╔════╝████╗  ██║    ██║    ██║██╔════╝██╔══██╗██║   ██║██║
██║   ██║██████╔╝█████╗  ██╔██╗ ██║    ██║ █╗ ██║█████╗  ██████╔╝██║   ██║██║
██║   ██║██╔═══╝ ██╔══╝  ██║╚██╗██║    ██║███╗██║██╔══╝  ██╔══██╗██║   ██║██║
╚██████╔╝██║     ███████╗██║ ╚████║    ╚███╔███╔╝███████╗██████╔╝╚██████╔╝██║
 ╚═════╝ ╚═╝     ╚══════╝╚═╝  ╚═══╝     ╚══╝╚══╝ ╚══════╝╚═════╝  ╚═════╝ ╚═╝


v0.9.6 - building the best AI user interface.

https://github.com/open-webui/open-webui

Fetching 30 files: 100%|██████████| 30/30 [01:56<00:00,  3.90s/it]
Loading weights: 100%|██████████| 103/103 [00:00<00:00, 5293.76it/s]
BertModel LOAD REPORT from: /app/backend/data/cache/embedding/models/models--sentence-transformers--all-MiniLM-L6-v2/snapshots/1110a243fdf4706b3f48f1d95db1a4f5529b4d41
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED:	can be ignored when loading from different task/architecture; not ok if you expect identical arch.
INFO:     Started server process [1]
INFO:     Waiting for application startup.
2026-06-14 15:37:55.856 | INFO     | open_webui.utils.logger:start_logger:214 - GLOBAL_LOG_LEVEL: INFO
2026-06-14 15:37:55.857 | INFO     | open_webui.main:lifespan:661 - Installing external dependencies of functions and tools...
2026-06-14 15:37:55.904 | INFO     | open_webui.utils.plugin:install_frontmatter_requirements:419 - No requirements found in frontmatter.
2026-06-14 15:37:55.904 | INFO     | open_webui.utils.automations:scheduler_worker_loop:171 - Scheduler worker started (poll interval: 10s)
2026-06-14 15:40:13.061 | INFO     | uvicorn.protocols.http.httptools_impl:send:483 - 172.17.0.1:50272 - "GET / HTTP/1.1" 200
2026-06-14 15:40:13.246 | INFO     | uvicorn.protocols.http.httptools_impl:send:483 - 172.17.0.1:50274 - "GET /_app/immutable/chunks/3wvqjvW3.js HTTP/1.1" 200
2026-06-14 15:40:13.250 | INFO     | uvicorn.protocols.http.httptools_impl:send:483 - 172.17.0.1:50272 - "GET /_app/immutable/entry/start.C43MIJEF.js HTTP/1.1" 200
2026-06-14 15:40:13.251 | INFO     | uvicorn.protocols.http.httptools_impl:send:483 - 172.17.0.1:50282 - "GET /_app/immutable/entry/app.C84R6Z2n.js HTTP/1.1" 200
2026-06-14 15:40:13.253 | INFO     | uvicorn.protocols.http.httptools_impl:send:483 - 172.17.0.1:50286 - "GET /_app/immutable/chunks/pAaorhp4.js HTTP/1.1" 200
2026-06-14 15:40:13.263 | INFO     | uvicorn.protocols.http.httptools_impl:send:483 - 172.17.0.1:50284 - "GET /_app/immutable/nodes/0.p2u0EbrE.js HTTP/1.1" 200
2026-06-14 15:40:13.264 | INFO     | uvicorn.protocols.http.httptools_impl:send:483 - 172.17.0.1:50304 - "GET /_app/immutable/chunks/DeuB8iJ0.js HTTP/1.1" 200
2026-06-14 15:40:13.272 | INFO     | uvicorn.protocols.http.httptools_impl:send:483 - 172.17.0.1:50286 - "GET /_app/immutable/chunks/DC0W6J9h.js HTTP/1.1" 200
2026-06-14 15:40:13.278 | INFO     | uvicorn.protocols.http.httptools_impl:send:483 - 172.17.0.1:50282 - "GET /_app/immutable/chunks/CBKmxchQ.js HTTP/1.1" 200
2026-06-14 15:40:13.280 | INFO     | uvicorn.protocols.http.httptools_impl:send:483 - 172.17.0.1:50272 - "GET /_app/immutable/chunks/CMqz8Nsr.js HTTP/1.1" 200
2026-06-14 15:40:13.281 | INFO     | uvicorn.protocols.http.httptools_impl:send:483 - 172.17.0.1:50304 - "GET /_app/immutable/chunks/T8jte1Yq.js HTTP/1.1" 200
2026-06-14 15:40:13.281 | INFO     | uvicorn.protocols.http.httptools_impl:send:483 - 172.17.0.1:50274 - "GET /_app/immutable/chunks/C7Lxt8YS.js HTTP/1.1" 200
2026-06-14 15:40:13.284 | INFO     | uvicorn.protocols.http.httptools_impl:send:483 - 172.17.0.1:50284 - "GET /_app/immutable/chunks/BWUGEjTy.js HTTP/1.1" 200
2026-06-14 15:40:13.300 | INFO     | uvicorn.protocols.http.httptools_impl:send:483 - 172.17.0.1:50286 - "GET /_app/immutable/chunks/CaTaFyk-.js HTTP/1.1" 200
2026-06-14 15:40:13.306 | INFO     | uvicorn.protocols.http.httptools_impl:send:483 - 172.17.0.1:50272 - "GET /_app/immutable/chunks/C-Awsziq.js HTTP/1.1" 200
2026-06-14 15:40:13.307 | INFO     | uvicorn.protocols.http.httptools_impl:send:483 - 172.17.0.1:50274 - "GET /_app/immutable/chunks/QTFhdCgb.js HTTP/1.1" 200
2026-06-14 15:40:13.308 | INFO     | uvicorn.protocols.http.httptools_impl:send:483 - 172.17.0.1:50304 - "GET /_app/immutable/chunks/DadTtvaW.js HTTP/1.1" 200
2026-06-14 15:40:13.311 | INFO     | uvicorn.protocols.http.httptools_impl:send:483 - 172.17.0.1:50282 - "GET /static/custom.css HTTP/1.1" 200
2026-06-14 15:40:13.312 | INFO     | uvicorn.protocols.http.httptools_impl:send:483 - 172.17.0.1:50284 - "GET /static/loader.js HTTP/1.1" 200
2026-06-14 15:40:13.316 | INFO     | uvicorn.protocols.http.httptools_impl:send:483 - 172.17.0.1:50286 - "GET /static/splash.png HTTP/1.1" 200
2026-06-14 15:40:13.445 | INFO     | uvicorn.protocols.http.httptools_impl:send:483 - 172.17.0.1:50282 - "GET /_app/immutable/nodes/1.uvDIxH0j.js HTTP/1.1" 200

Once startup is complete, check that the container is healthy:

Terminal
$ docker ps
CONTAINER ID   IMAGE                                        COMMAND          CREATED         STATUS                   PORTS                    NAMES
98b68ce37c24   ghcr.io/open-webui/open-webui:v0.9.6-cuda   "bash start.sh"  4 minutes ago   Up 4 minutes (healthy)   0.0.0.0:3000->8080/tcp   open-webui

The (healthy) status confirms the container's health check is passing. You can now open localhost:3000 and chat with your local models through the browser interface, with the GPU fully active.

Monitoring GPU Usage

With the GPU-enabled container running, it is satisfying to watch the VRAM and utilisation numbers move in real time while chatting. Two commands cover this well:

Terminal - GPU stats (refreshes every second)
$ watch -n 1 nvidia-smi
Terminal - container resource usage
$ docker stats open-webui
⚠️
VRAM headroom on 4 GB cards The CUDA image itself reserves some VRAM for the embedding model it loads on startup (around 150–200 MB). Combined with the model layers Ollama loads for inference, you are operating close to the 4 GB ceiling with a 4B model. Avoid running both simultaneously with heavier models, or you will see layers spill into system RAM and inference will slow significantly.
· · ·

Getting Open WebUI talking to Ollama is not quite the one-liner the documentation suggests - at least not on Linux when Ollama is running as a systemd service. The Docker networking problem in particular is easy to miss if you do not know to check what address Ollama is actually bound to. But once those two pieces are in place, the setup is genuinely solid: conversation history, multiple model support, document uploads, and a clean interface, all running locally with no data leaving the machine.

In the next post, I want to explore some of the features Open WebUI adds on top of the basic chat - search engine integration, custom system prompts, and the built-in tools and skills system. More to come.

No comments:

Post a Comment