By the end of this tutorial, you will have a working AI model running locally on your machine — no cloud APIs, no subscriptions, no data leaving your computer. Total time: about 5 minutes.

Prerequisites

Step 1: Install Ollama

Linux

curl -fsSL https://ollama.com/install.sh | sh

This downloads and installs the Ollama binary. After installation, the Ollama service starts automatically.

macOS

Download from ollama.com and drag to Applications, or use Homebrew:

brew install ollama

Windows

Download the installer from ollama.com and run it. Ollama will install as a system service.

Step 2: Pull Your First Model

Open a terminal and run:

ollama pull llama3.2

This downloads the Llama 3.2 model (about 2GB for the 3B version). You will see a progress bar as the model downloads.

Step 3: Start Chatting

ollama run llama3.2

You are now in an interactive chat session with a local AI model. Type any question or prompt and press Enter. The model runs entirely on your hardware — no internet needed after the initial download.

Step 4: Try the API

Ollama also runs an API server. Open another terminal and try:

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Explain quantum computing in one paragraph",
  "stream": false
}'

You should get a JSON response with the model's output. This API is what you will use to integrate AI into your applications.

Step 5: Explore More Models

# List installed models
ollama list

# Pull a larger, more capable model
ollama pull gemma3:27b

# Pull a code-specialized model
ollama pull codellama

# Remove a model you no longer need
ollama rm llama3.2

Testing It

To verify everything works:

  1. Run ollama list — you should see your installed models
  2. Run curl http://localhost:11434/api/tags — the API should return model info
  3. Run ollama run llama3.2 "What is 2+2?" — you should get a response

What is Next

Now that you have Ollama running, you can build a chatbot with Node.js, create a document Q&A system with RAG, or set up an AI-powered API for your applications.

Frequently Asked Questions

How much disk space do models need?

Small models (3B-7B): 2-5GB. Medium models (13B-34B): 8-20GB. Large models (70B): 25-40GB depending on quantization level.

Can I run multiple models simultaneously?

Yes, but each loaded model uses RAM. Ollama automatically unloads idle models after a timeout. With enough RAM, multiple models can be loaded at once.