We are building a web-based chatbot that runs entirely on your machine. The AI model runs through Ollama, the backend is Node.js with Express, and the frontend is a clean chat interface. Zero data leaves your network.

Prerequisites

Step 1: Project Setup

mkdir local-chatbot && cd local-chatbot
npm init -y
npm install express

Step 2: Build the Server

Create server.js:

const express = require('express');
const app = express();

app.use(express.json());
app.use(express.static('public'));

const OLLAMA_URL = 'http://localhost:11434/api/chat';
const MODEL = 'llama3.2';

app.post('/api/chat', async (req, res) => {
  const { messages } = req.body;

  try {
    const response = await fetch(OLLAMA_URL, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        model: MODEL,
        messages,
        stream: false,
      }),
    });

    const data = await response.json();
    res.json({ reply: data.message?.content || 'No response' });
  } catch (err) {
    res.status(500).json({ error: 'Failed to reach Ollama: ' + err.message });
  }
});

app.listen(3000, () => {
  console.log('Chatbot running at http://localhost:3000');
});

Step 3: Build the Frontend

Create public/index.html:

<!DOCTYPE html>
<html>
<head>
  <title>Local AI Chatbot</title>
  <style>
    body { font-family: system-ui; max-width: 600px; margin: 40px auto; background: #111; color: #eee; }
    #chat { height: 400px; overflow-y: auto; border: 1px solid #333; padding: 16px; border-radius: 8px; }
    .msg { margin: 8px 0; padding: 8px 12px; border-radius: 8px; }
    .user { background: #1a3a5c; text-align: right; }
    .bot { background: #1a2a1a; }
    #input { width: 100%; padding: 12px; background: #222; border: 1px solid #333; color: #eee; border-radius: 8px; margin-top: 8px; }
  </style>
</head>
<body>
  <h2>Local AI Chat</h2>
  <div id="chat"></div>
  <input id="input" placeholder="Type a message..." onkeydown="if(event.key==='Enter')send()">
  <script>
    const chat = document.getElementById('chat');
    const input = document.getElementById('input');
    const history = [];

    async function send() {
      const text = input.value.trim();
      if (!text) return;
      input.value = '';
      addMsg('user', text);
      history.push({ role: 'user', content: text });

      const res = await fetch('/api/chat', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ messages: history }),
      });
      const data = await res.json();
      const reply = data.reply || data.error || 'Error';
      addMsg('bot', reply);
      history.push({ role: 'assistant', content: reply });
    }

    function addMsg(role, text) {
      const div = document.createElement('div');
      div.className = 'msg ' + (role === 'user' ? 'user' : 'bot');
      div.textContent = text;
      chat.appendChild(div);
      chat.scrollTop = chat.scrollHeight;
    }
  </script>
</body>
</html>

Step 4: Run It

node server.js

Open http://localhost:3000 in your browser. Type a message and press Enter. The response comes from your local Ollama instance — completely private.

Testing It

  1. Verify Ollama is running: curl http://localhost:11434/api/tags
  2. Start the server: node server.js
  3. Open the browser and send a test message
  4. Check the terminal for any error output

What is Next

Add streaming responses for a real-time typing effect. Add conversation persistence with a database. Add model selection so users can switch between models. Or add RAG to let the chatbot answer questions about your documents.

Frequently Asked Questions

Why is the response slow?

Local inference speed depends on your hardware. With CPU-only inference, expect 5-15 tokens per second. With a GPU, expect 30-100+ tokens per second. Smaller models are faster.

Can I use a different model?

Yes. Change the MODEL variable to any model you have pulled with Ollama. Try codellama for code questions or a larger model for better quality.