Build a Local Chatbot With Ollama and Node.js

We are building a web-based chatbot that runs entirely on your machine. The AI model runs through Ollama, the backend is Node.js with Express, and the frontend is a clean chat interface. Zero data leaves your network.

Prerequisites

Node.js 18+ installed
Ollama installed and running (ollama serve)
A pulled model (ollama pull llama3.2)

Step 1: Project Setup

mkdir local-chatbot && cd local-chatbot
npm init -y
npm install express

Step 2: Build the Server

Create server.js:

const express = require('express');
const app = express();

app.use(express.json());
app.use(express.static('public'));

const OLLAMA_URL = 'http://localhost:11434/api/chat';
const MODEL = 'llama3.2';

app.post('/api/chat', async (req, res) => {
  const { messages } = req.body;

  try {
    const response = await fetch(OLLAMA_URL, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        model: MODEL,
        messages,
        stream: false,
      }),
    });

    const data = await response.json();
    res.json({ reply: data.message?.content || 'No response' });
  } catch (err) {
    res.status(500).json({ error: 'Failed to reach Ollama: ' + err.message });
  }
});

app.listen(3000, () => {
  console.log('Chatbot running at http://localhost:3000');
});

Step 3: Build the Frontend

Create public/index.html:

<!DOCTYPE html>
<html>
<head>
  <title>Local AI Chatbot</title>
  <style>
    body { font-family: system-ui; max-width: 600px; margin: 40px auto; background: #111; color: #eee; }
    #chat { height: 400px; overflow-y: auto; border: 1px solid #333; padding: 16px; border-radius: 8px; }
    .msg { margin: 8px 0; padding: 8px 12px; border-radius: 8px; }
    .user { background: #1a3a5c; text-align: right; }
    .bot { background: #1a2a1a; }
    #input { width: 100%; padding: 12px; background: #222; border: 1px solid #333; color: #eee; border-radius: 8px; margin-top: 8px; }
  </style>
</head>
<body>
  <h2>Local AI Chat</h2>
  <div id="chat"></div>
  <input id="input" placeholder="Type a message..." onkeydown="if(event.key==='Enter')send()">
  <script>
    const chat = document.getElementById('chat');
    const input = document.getElementById('input');
    const history = [];

    async function send() {
      const text = input.value.trim();
      if (!text) return;
      input.value = '';
      addMsg('user', text);
      history.push({ role: 'user', content: text });

      const res = await fetch('/api/chat', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ messages: history }),
      });
      const data = await res.json();
      const reply = data.reply || data.error || 'Error';
      addMsg('bot', reply);
      history.push({ role: 'assistant', content: reply });
    }

    function addMsg(role, text) {
      const div = document.createElement('div');
      div.className = 'msg ' + (role === 'user' ? 'user' : 'bot');
      div.textContent = text;
      chat.appendChild(div);
      chat.scrollTop = chat.scrollHeight;
    }
  </script>
</body>
</html>

Step 4: Run It

node server.js

Open http://localhost:3000 in your browser. Type a message and press Enter. The response comes from your local Ollama instance — completely private.

Testing It

Verify Ollama is running: curl http://localhost:11434/api/tags
Start the server: node server.js
Open the browser and send a test message
Check the terminal for any error output

What is Next

Add streaming responses for a real-time typing effect. Add conversation persistence with a database. Add model selection so users can switch between models. Or add RAG to let the chatbot answer questions about your documents.

Frequently Asked Questions

Why is the response slow?

Local inference speed depends on your hardware. With CPU-only inference, expect 5-15 tokens per second. With a GPU, expect 30-100+ tokens per second. Smaller models are faster.

Can I use a different model?

Yes. Change the MODEL variable to any model you have pulled with Ollama. Try codellama for code questions or a larger model for better quality.

Build a Local Chatbot With Ollama and Node.js

Prerequisites

Step 1: Project Setup

Step 2: Build the Server

Step 3: Build the Frontend

Step 4: Run It

Testing It

What is Next

Frequently Asked Questions

Why is the response slow?

Can I use a different model?

Related Content

Open Source Models Now Match GPT-4 on Major Benchmarks

Install Ollama and Run Your First Model in 5 Minutes

Fine-Tune a Language Model Locally With QLoRA

Ollama Ships Major Update With Native Tool Calling