Chatting with LLMs: Exploring Local AI with Ollama

AI-powered large language models (LLMs) have become a big deal. They can generate text, assist with coding, and handle all sorts of conversational tasks. With local AI models becoming more accessible, developers no longer have to rely on cloud-based services to experiment with LLMs. One of the preferred ways to run these models locally is with Ollama, a tool designed to simplify working with AI on your own machine.

Running LLMs Locally with Ollama

Ollama makes it easy to download and run different LLMs without needing cloud access. This is great for privacy, performance, and flexibility. With a capable machine, you can run AI models just as easily as you would any local application.

Right now, my local Ollama setup includes several well-known models:

deepseek-r1:8b
deepseek-r1:7b
llama3.1:8b
qwen2.5-coder:7b
qwen2.5-coder:14b
codegemma:latest
llama3.2:latest

A search on the Ollama site allows for pulling many others. Available models are constantly being added and/or updated.

Some of the More Popular Models

DeepSeek R1 (7B and 8B) is a general-purpose LLM that balances speed and performance.
Llama 3.1 (8B) is Meta’s open-source model, optimized for chat.
Qwen 2.5 Coder (7B and 14B) is built for code generation and AI-assisted development.
CodeGemma is another code-focused model with solid capabilities.
Llama 3.2 (latest) is a newer version of the Llama model, designed for efficiency.

DeepSeek R1: 7B vs. 8B

The DeepSeek R1 models I have loaded are in two variations: 7B and 8B. The main difference is that the 8B model has an extra billion parameters, making it slightly better at complex reasoning. The tradeoff is increased memory usage and a different base model family.

The 7B model is built on the Transformer-XL architecture, optimized for efficiency with fewer parameters while maintaining strong language understanding. The 8B model is based on the GPT-style dense Transformer architecture, incorporating additional parameters that improve reasoning and generation capabilities.

For most tasks, the 7B model is fast and efficient, making it a good choice when resources are limited. If extra processing power is available, the 8B model can provide more refined responses with improved context understanding.

Experimenting with LLMs Using Ollama Chat Tauri

To better understand how different LLMs respond to the same prompts, I created a simple cross-platform desktop app called Ollama Chat Tauri.

This app lets me generate responses from different models so I can compare their responses to identical inputs. The goal is to see how each LLM handles the same request and identify where certain models excel.

Tech Stack

Rust and Tauri for the backend and app framework.
React for the frontend UI.
SQLite for local storage of interactions.
Ollama API to handle communication with local AI models.

This setup makes it easy to experiment with different LLMs, all from a single interface.

The project is open-source, and the code is available here: Ollama Chat Tauri

Where This Is Going

I plan to continue experimenting with AI and LLMs in future projects. Comparing AI-generated code, improving structured problem-solving, and optimizing local AI workflows are all areas I’ll be exploring.

Local AI is a big deal. Running models on your own hardware provides better privacy, lower latency, and more control over the AI’s behavior. With tools like Ollama and frameworks like Tauri, it is easier than ever to build AI-powered applications that run entirely offline.

If you’re curious about LLMs and want to experiment with them locally, Ollama is a great place to start. And if you want to see how different models compare, check out Ollama Chat Tauri.