Running an LLM on My Laptop — A Weekend Experiment with Ollama
If you have been following the AI space at all, you have probably been using tools like ChatGPT or Gemini through a web browser. They are impressive, but there is always this nagging feeling — your prompts are being sent to someone else's server, logged, possibly used for training, and you have a usage limit. For a lot of experiments I want to run, that setup is not ideal.
So I decided to try running a language model entirely on my laptop. No internet connection required, no API key, no subscription. Just the model, my GPU, and a terminal. Here is how it went.
Why This Made Sense for Me
My background on this blog has always been computer vision — working with OpenCV, JavaCV, image processing pipelines. Naturally, I have been watching the progression from classical vision algorithms to deep learning to now these large multimodal models with a lot of interest. Running a language model locally felt like a natural next experiment - it is the same general idea of getting a powerful model to run on your own hardware rather than relying on a cloud service.
The other motivation: I have a Raspberry Pi 4 sitting on my desk doing not much. I wanted to eventually get a small model running on it too. The laptop would be step one.
The Hardware
Here is what I was working with for this experiment:
| Component | Spec |
|---|---|
| Laptop | MSI Katana GF66 11UC |
| CPU | Intel Core i7-11800H (8 cores / 16 threads) |
| GPU | NVIDIA RTX 3050 — 4 GB VRAM |
| RAM | 16 GB DDR4 |
| OS | Linux Mint 22.3 (x64) |
The RTX 3050 is not a high-end card, but it has 4 GB of dedicated VRAM, which is enough to run smaller models entirely on the GPU. That means fast inference - not the agonizingly slow CPU-only speeds you might expect on a consumer laptop.
Prerequisites
Check whether nvidia-smi is properly installed or not by executing nvidia-smi command
$ nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 595.58.03 Driver Version: 595.58.03 CUDA Version: 13.2 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3050 ... Off | 00000000:01:00.0 Off | N/A |
| N/A 47C P0 752W / 60W | 14MiB / 4096MiB | 9% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 1267 G /usr/lib/xorg/Xorg 4MiB |
+-----------------------------------------------------------------------------------------+
After confirming that the nvidia-smi is properly installed, I installed nvidia-cuda-toolkit or nvcc. Although it's not required to install it manually, since Ollama includes its own pre-compiled CUDA binaries. It automatically detects and uses your NVIDIA GPU right out of the box. I installed it if I ever wanted to compile Ollama from its source code or to customize the underlying llama.cpp engine that Ollama relies on.
$ sudo apt install nvidia-cuda-toolkit
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following packages were automatically installed and are no longer required:
libnvidia-cfg1-590 libnvidia-common-590 libnvidia-compute-590 libnvidia-decode-590 libnvidia-egl-wayland1:i386 libnvidia-encode-590 libnvidia-extra-590 libnvidia-fbc1-590 libnvidia-gl-590 libwayland-server0:i386
nvidia-compute-utils-590 nvidia-firmware-590-590.48.01 nvidia-kernel-source-590-open nvidia-utils-590 xserver-xorg-video-nvidia-590
Use 'sudo apt autoremove' to remove them.
The following additional packages will be installed:
ca-certificates-java cpp-12 fonts-dejavu-extra g++-12 gcc-12 gcc-12-base ibverbs-providers java-common libaccinj64-12.0 libamd-comgr2 libamdhip64-5 libatk-wrapper-java libatk-wrapper-java-jni libcu++-dev libcub-dev libcublas12
libcublaslt12 libcudart12 libcufft11 libcufftw11 libcuinj64-12.0 libcupti-dev libcupti-doc libcupti12 libcurand10 libcusolver11 libcusolvermg11 libcusparse12 libgcc-12-dev libgl-dev libglx-dev libhsa-runtime64-1 libhsakmt1
libhwloc-plugins libhwloc15 libibumad3 libjpeg62 libnppc12 libnppial12 libnppicc12 libnppidei12 libnppif12 libnppig12 libnppim12 libnppist12 libnppisu12 libnppitc12 libnpps12 libnvblas12 libnvidia-ml-dev libnvjitlink12 libnvjpeg12
libnvrtc-builtins12.0 libnvrtc12 libnvtoolsext1 libnvvm4 libpfm4 libpthread-stubs0-dev librdmacm1t64 libstdc++-12-dev libtbb-dev libtbb12 libtbbbind-2-5 libtbbmalloc2 libthrust-dev libucx0 libvdpau-dev libx11-dev libxau-dev
libxcb1-dev libxdmcp-dev node-html5shiv nsight-compute nsight-compute-target nsight-systems nsight-systems-target nvidia-cuda-dev nvidia-cuda-gdb nvidia-cuda-toolkit-doc nvidia-opencl-dev nvidia-profiler nvidia-visual-profiler
ocl-icd-opencl-dev opencl-c-headers opencl-clhpp-headers openjdk-8-jre openjdk-8-jre-headless x11proto-dev xorg-sgml-doctools xtrans-dev
Suggested packages:
gcc-12-locales cpp-12-doc g++-12-multilib gcc-12-doc gcc-12-multilib default-jre libhwloc-contrib-plugins libstdc++-12-doc libtbb-doc libvdpau-doc libx11-doc libxcb-doc nvidia-cuda-samples opencl-clhpp-headers-doc fonts-nanum
fonts-ipafont-gothic fonts-ipafont-mincho fonts-wqy-microhei fonts-wqy-zenhei fonts-indic
Recommended packages:
libnvcuvid1
The following NEW packages will be installed:
ca-certificates-java cpp-12 fonts-dejavu-extra g++-12 gcc-12 gcc-12-base ibverbs-providers java-common libaccinj64-12.0 libamd-comgr2 libamdhip64-5 libatk-wrapper-java libatk-wrapper-java-jni libcu++-dev libcub-dev libcublas12
libcublaslt12 libcudart12 libcufft11 libcufftw11 libcuinj64-12.0 libcupti-dev libcupti-doc libcupti12 libcurand10 libcusolver11 libcusolvermg11 libcusparse12 libgcc-12-dev libgl-dev libglx-dev libhsa-runtime64-1 libhsakmt1
libhwloc-plugins libhwloc15 libibumad3 libjpeg62 libnppc12 libnppial12 libnppicc12 libnppidei12 libnppif12 libnppig12 libnppim12 libnppist12 libnppisu12 libnppitc12 libnpps12 libnvblas12 libnvidia-ml-dev libnvjitlink12 libnvjpeg12
libnvrtc-builtins12.0 libnvrtc12 libnvtoolsext1 libnvvm4 libpfm4 libpthread-stubs0-dev librdmacm1t64 libstdc++-12-dev libtbb-dev libtbb12 libtbbbind-2-5 libtbbmalloc2 libthrust-dev libucx0 libvdpau-dev libx11-dev libxau-dev
libxcb1-dev libxdmcp-dev node-html5shiv nsight-compute nsight-compute-target nsight-systems nsight-systems-target nvidia-cuda-dev nvidia-cuda-gdb nvidia-cuda-toolkit nvidia-cuda-toolkit-doc nvidia-opencl-dev nvidia-profiler
nvidia-visual-profiler ocl-icd-opencl-dev opencl-c-headers opencl-clhpp-headers openjdk-8-jre openjdk-8-jre-headless x11proto-dev xorg-sgml-doctools xtrans-dev
0 upgraded, 91 newly installed, 0 to remove and 69 not upgraded.
Need to get 2,261 MB of archives.
After this operation, 6,892 MB of additional disk space will be used.
Once the installation was complete, I verified it by running nvcc --version command.
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Jan__6_16:45:21_PST_2023
Cuda compilation tools, release 12.0, V12.0.140
Build cuda_12.0.r12.0/compiler.32267302_0
The Tool: Ollama
Ollama is the tool that makes local LLM inference easy. Think of it like a package manager for AI models - it handles model downloads, quantization, GPU memory management, and exposes a REST API, all through a single binary. It detects your NVIDIA GPU automatically and figures out how many model layers to load into VRAM. You do not have to configure any of that manually.
It also has a growing library of models you can pull with one command, similar to how docker pull works.
Installation
One command:
$ curl -fsSL https://ollama.com/install.sh | sh
>>> Installing ollama to /usr/local
>>> Downloading ollama-linux-amd64.tar.zst
######################################################################## 100.0%
>>> Creating ollama user...
>>> Adding ollama user to render group...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> Enabling and starting ollama service...
Created symlink /etc/systemd/system/default.target.wants/ollama.service → /etc/systemd/system/ollama.service.
>>> NVIDIA GPU installed.
The installer detects your OS, downloads the right binary, and sets up Ollama as a background service. Once done, verify it:
$ ollama --version
ollama version is 0.24.0
$ systemctl status ollama
● ollama.service - Ollama Service
Loaded: loaded (/etc/systemd/system/ollama.service; enabled; preset: enabled)
Active: active (running) since Thu 2026-05-28 00:53:08 IST; 14min ago
Main PID: 19849 (ollama)
Tasks: 21 (limit: 18725)
Memory: 61.2M (peak: 292.1M)
CPU: 3.397s
CGroup: /system.slice/ollama.service
└─19849 /usr/local/bin/ollama serve
No config files, no dependency juggling. That was the entire setup.
Picking the Models
1. Phi-4 Mini
With 4 GB of VRAM I needed to be selective. I went with Phi-4 Mini from Microsoft - a 3.8 billion parameter model that fits comfortably within 4 GB and genuinely performs well for its size.
For anyone wondering about model quality - it handles multi-step reasoning, code generation, and technical questions surprisingly well for a 3.8B model. It is not GPT-4, but for a laptop running a fully offline model, the output quality is genuinely impressive.
Running It
Ollama has a simple ollam run command followed by a model name. It actaully looks for the model in local machine, if not available it downloads it and runs it.
$ ollama run phi4-mini
pulling manifest
pulling 3c168af1dea0: 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 2.5 GB
pulling 813f53fdc6e5: 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 655 B
pulling fa8235e5b48f: 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 1.1 KB
pulling 8c2539a423c4: 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 411 B
verifying sha256 digest
writing manifest
success
I tried a few different types of prompts to get a sense of what it could do:
Sample conversations
After these conversations, I wasn't satisfied with Phi-4 mini, so I started looking at other suitable models for my local LLM and found out Gemma3:4B would be another ideal candidate, so I went ahead and installed it.
2. Gemma3:4B
$ ollama pull gemma3:4b
pulling manifest
pulling aeda25e63ebd: 100% ▕███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ ▏ 3.3 GB/3.3 GB 1.5 MB/s 0s
pulling e0a42594d802: 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 358 B
pulling dd084c7d92a3: 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 8.4 KB
pulling 3116c5225075: 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 77 B
pulling b6ae5839783f: 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 489 B
verifying sha256 digest
writing manifest
success
Sample conversations, I asked the same first question.
The output of Gemma3:4B compared to Phi-4 mini was far better. And I would continue to use Gemma3:4B in my laptop for sometime and then I'll try to run it in my Raspberry Pi 4 (If it can run it)
Adding a Web Interface
The terminal chat works fine, but for longer sessions I wanted something more comfortable. Open WebUI is a self-hosted interface — essentially a ChatGPT clone — that connects to Ollama automatically. With Docker it is a single command:
$ docker run -d \ -p 3000:8080 \ --add-host=host.docker.internal:host-gateway \ -v open-webui:/app/backend/data \ --name open-webui \ --restart always \ ghcr.io/open-webui/open-webui:main # Open: http://localhost:3000
You get conversation history, multiple model support, and a clean interface — all running locally, zero data leaving your machine.
Handy Commands to Know
# List downloaded models $ ollama list # Pull without running $ ollama pull mistral # Check what's loaded in GPU memory $ ollama ps # Hit the REST API directly (useful for scripts) $ curl http://localhost:11434/api/generate \ -d '{"model":"phi4-mini","prompt":"Hello!","stream":false}'
If you are sitting on a laptop with an NVIDIA card and running Linux, there is genuinely no reason not to try this. The setup is simpler than anything I was doing with JavaCV back in 2013, and the results are far more immediately impressive. The model files live in ~/.ollama/models, everything is local, and you can delete it all cleanly if it is not for you.
Good to be writing here again. More experiments coming.