By the end of this tutorial, you will have a working AI model running locally on your machine — no cloud APIs, no subscriptions, no data leaving your computer. Total time: about 5 minutes.
Prerequisites
- A computer with at least 8GB RAM (16GB+ recommended)
- macOS, Linux, or Windows
- An internet connection (for the initial download only)
Step 1: Install Ollama
Linux
curl -fsSL https://ollama.com/install.sh | sh
This downloads and installs the Ollama binary. After installation, the Ollama service starts automatically.
macOS
Download from ollama.com and drag to Applications, or use Homebrew:
brew install ollama
Windows
Download the installer from ollama.com and run it. Ollama will install as a system service.
Step 2: Pull Your First Model
Open a terminal and run:
ollama pull llama3.2
This downloads the Llama 3.2 model (about 2GB for the 3B version). You will see a progress bar as the model downloads.
Step 3: Start Chatting
ollama run llama3.2
You are now in an interactive chat session with a local AI model. Type any question or prompt and press Enter. The model runs entirely on your hardware — no internet needed after the initial download.
Step 4: Try the API
Ollama also runs an API server. Open another terminal and try:
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Explain quantum computing in one paragraph",
"stream": false
}'
You should get a JSON response with the model's output. This API is what you will use to integrate AI into your applications.
Step 5: Explore More Models
# List installed models
ollama list
# Pull a larger, more capable model
ollama pull gemma3:27b
# Pull a code-specialized model
ollama pull codellama
# Remove a model you no longer need
ollama rm llama3.2
Testing It
To verify everything works:
- Run
ollama list— you should see your installed models - Run
curl http://localhost:11434/api/tags— the API should return model info - Run
ollama run llama3.2 "What is 2+2?"— you should get a response
What is Next
Now that you have Ollama running, you can build a chatbot with Node.js, create a document Q&A system with RAG, or set up an AI-powered API for your applications.
Frequently Asked Questions
How much disk space do models need?
Small models (3B-7B): 2-5GB. Medium models (13B-34B): 8-20GB. Large models (70B): 25-40GB depending on quantization level.
Can I run multiple models simultaneously?
Yes, but each loaded model uses RAM. Ollama automatically unloads idle models after a timeout. With enough RAM, multiple models can be loaded at once.