← All posts

How to Run an AI Trading Bot Locally with Ollama (No API Keys)

Published 19 Apr 2026 · 7 min read · Tutorial

If you want an AI trading bot but don't want to hand your portfolio snapshots to a cloud LLM, you have a third option: run the model on your own machine via Ollama, and point KlawTrade at it. No API keys, no marginal cost, no data leaving your network.

This guide shows the exact 15-minute setup. By the end, you'll have a bot where every trade signal is generated by Llama 3.1 or similar running on your laptop, and every signal still passes through the same deterministic 14-check risk gate that ships with KlawTrade.

Why this works.Ollama exposes an OpenAI-compatible HTTP API. KlawTrade's AI strategy talks to any OpenAI-compatible endpoint via a configurable base_url. So the bot thinks it's talking to OpenAI, but it's really talking to a model on your laptop. Same code path, same schema validation, same risk gate.

Step 1 — Install Ollama and pull a model

Download Ollama from ollama.com/download. On macOS and Linux:

# macOS (Homebrew)
brew install ollama

# Linux
curl -fsSL https://ollama.com/install.sh | sh

# Start the server
ollama serve &

# Pull a capable instruct-tuned model
ollama pull llama3.1:8b-instruct-q4_K_M

For trading decisions you want an instruct-tuned model with decent JSON discipline. Good picks, in order of recommended:

  • llama3.1:8b-instruct-q4_K_M — 8B parameters, runs on 16 GB RAM, excellent JSON compliance.
  • qwen2.5:14b-instruct — a bit slower but noticeably better reasoning on technical indicators.
  • mistral-nemo:12b-instruct — strong tool-use behaviour, similar quality to Qwen.

Step 2 — Install KlawTrade with AI extras

pip install "klawtrade[ai]"
klawtrade init  # generates config/settings.yaml

Step 3 — Configure the local provider

Open config/settings.yaml and update the strategy.ai block:

strategy:
  ai:
    enabled: true
    provider: "local"                          # <-- key line
    base_url: "http://localhost:11434/v1"      # Ollama default
    api_key: "ollama"                          # any non-empty string
    model: "llama3.1:8b-instruct-q4_K_M"
    temperature: 0.0
    min_confidence: 0.80                        # stricter for smaller models
    require_rule_confirmation: true             # belt + braces

  rules:
    momentum: true
    mean_reversion: true

Two notes on accuracy when using smaller local models:

  • Bump min_confidence to 0.80 or higher. Local models tend to be overconfident; the threshold prunes marginal signals.
  • Leave require_rule_confirmation: true. This guarantees every LLM-generated BUY/SELL also has a classic indicator supporting it (RSI, MACD, SMA cross, or Bollinger) — a strong guard against the kind of soft hallucinations that smaller models produce.

Step 4 — Start the bot

klawtrade start

You should see a log line like:

INFO  AI strategy enabled provider=local model=llama3.1:8b-instruct-q4_K_M

Check http://localhost:8080 for the real-time dashboard. Trades will start flowing as soon as the strategy finds a setup that satisfies both the model and the classic indicator cross-check — and, of course, the 14-check risk gate.

Accuracy: how good is this really?

Local models are not Claude or GPT-4. On our internal backtest bench (see the Claude vs GPT post), an 8B local model produced about 95% valid JSON responses versus 100% for Claude Sonnet 4. The post-risk-gate hit rate was within one percentage point of the cloud models, though — because the classic indicator cross-check and 14-check risk gate filter out most of the low-quality local-model signals before they become trades.

The punchline:the gate does more work than the model. If you're comfortable with somewhat noisier signal generation in exchange for zero marginal cost and full data privacy, local-via-Ollama is a genuinely viable path.

Other local runtimes

Ollama is the easiest option, but KlawTrade works with any OpenAI-compatible endpoint. A few alternatives:

  • LM Studio — GUI-driven. Set base_url: http://localhost:1234/v1.
  • vLLM — production-grade serving. Set base_url: http://localhost:8000/v1.
  • LocalAI — supports many backends. base_url: http://localhost:8080/v1.

Hardware requirements

  • 8B models (Llama 3.1 / Qwen2.5-7B) — 16 GB RAM minimum, 24 GB comfortable. CPU-only works but is slow; an Apple Silicon M-series or an NVIDIA GPU with 8+ GB VRAM is much better.
  • 13-14B models — 32 GB RAM, or a 16 GB VRAM GPU.
  • 70B models — consumer hardware struggles. Use a dedicated GPU server or fall back to cloud.

What to watch for

In the first week of running any new local model, do this:

  1. Paper-trade only. system.mode: paper in the config.
  2. Open the dashboard's audit log and spot-check 20 random AI decisions per day. If the reasoningfield cites indicators that weren't in the snapshot — even once — tighten min_confidence and/or switch to a larger model.
  3. Watch the Sharpe and max drawdown in the backtester. A local model whose Sharpe drops below 0.5 on realistic data is telling you its signal is not worth acting on even after the gate.

Where to go next