How to Run an AI Trading Bot Locally with Ollama (No API Keys)
If you want an AI trading bot but don't want to hand your portfolio snapshots to a cloud LLM, you have a third option: run the model on your own machine via Ollama, and point KlawTrade at it. No API keys, no marginal cost, no data leaving your network.
This guide shows the exact 15-minute setup. By the end, you'll have a bot where every trade signal is generated by Llama 3.1 or similar running on your laptop, and every signal still passes through the same deterministic 14-check risk gate that ships with KlawTrade.
base_url. So the bot thinks it's talking to OpenAI, but it's really talking to a model on your laptop. Same code path, same schema validation, same risk gate.Step 1 — Install Ollama and pull a model
Download Ollama from ollama.com/download. On macOS and Linux:
# macOS (Homebrew)
brew install ollama
# Linux
curl -fsSL https://ollama.com/install.sh | sh
# Start the server
ollama serve &
# Pull a capable instruct-tuned model
ollama pull llama3.1:8b-instruct-q4_K_MFor trading decisions you want an instruct-tuned model with decent JSON discipline. Good picks, in order of recommended:
llama3.1:8b-instruct-q4_K_M— 8B parameters, runs on 16 GB RAM, excellent JSON compliance.qwen2.5:14b-instruct— a bit slower but noticeably better reasoning on technical indicators.mistral-nemo:12b-instruct— strong tool-use behaviour, similar quality to Qwen.
Step 2 — Install KlawTrade with AI extras
pip install "klawtrade[ai]"
klawtrade init # generates config/settings.yamlStep 3 — Configure the local provider
Open config/settings.yaml and update the strategy.ai block:
strategy:
ai:
enabled: true
provider: "local" # <-- key line
base_url: "http://localhost:11434/v1" # Ollama default
api_key: "ollama" # any non-empty string
model: "llama3.1:8b-instruct-q4_K_M"
temperature: 0.0
min_confidence: 0.80 # stricter for smaller models
require_rule_confirmation: true # belt + braces
rules:
momentum: true
mean_reversion: trueTwo notes on accuracy when using smaller local models:
- Bump
min_confidenceto 0.80 or higher. Local models tend to be overconfident; the threshold prunes marginal signals. - Leave
require_rule_confirmation: true. This guarantees every LLM-generated BUY/SELL also has a classic indicator supporting it (RSI, MACD, SMA cross, or Bollinger) — a strong guard against the kind of soft hallucinations that smaller models produce.
Step 4 — Start the bot
klawtrade startYou should see a log line like:
INFO AI strategy enabled provider=local model=llama3.1:8b-instruct-q4_K_MCheck http://localhost:8080 for the real-time dashboard. Trades will start flowing as soon as the strategy finds a setup that satisfies both the model and the classic indicator cross-check — and, of course, the 14-check risk gate.
Accuracy: how good is this really?
Local models are not Claude or GPT-4. On our internal backtest bench (see the Claude vs GPT post), an 8B local model produced about 95% valid JSON responses versus 100% for Claude Sonnet 4. The post-risk-gate hit rate was within one percentage point of the cloud models, though — because the classic indicator cross-check and 14-check risk gate filter out most of the low-quality local-model signals before they become trades.
The punchline:the gate does more work than the model. If you're comfortable with somewhat noisier signal generation in exchange for zero marginal cost and full data privacy, local-via-Ollama is a genuinely viable path.
Other local runtimes
Ollama is the easiest option, but KlawTrade works with any OpenAI-compatible endpoint. A few alternatives:
- LM Studio — GUI-driven. Set
base_url: http://localhost:1234/v1. - vLLM — production-grade serving. Set
base_url: http://localhost:8000/v1. - LocalAI — supports many backends.
base_url: http://localhost:8080/v1.
Hardware requirements
- 8B models (Llama 3.1 / Qwen2.5-7B) — 16 GB RAM minimum, 24 GB comfortable. CPU-only works but is slow; an Apple Silicon M-series or an NVIDIA GPU with 8+ GB VRAM is much better.
- 13-14B models — 32 GB RAM, or a 16 GB VRAM GPU.
- 70B models — consumer hardware struggles. Use a dedicated GPU server or fall back to cloud.
What to watch for
In the first week of running any new local model, do this:
- Paper-trade only.
system.mode: paperin the config. - Open the dashboard's audit log and spot-check 20 random AI decisions per day. If the reasoningfield cites indicators that weren't in the snapshot — even once — tighten
min_confidenceand/or switch to a larger model. - Watch the Sharpe and max drawdown in the backtester. A local model whose Sharpe drops below 0.5 on realistic data is telling you its signal is not worth acting on even after the gate.
Where to go next
- AI strategy reference — full config and the six accuracy guardrails.
- 14-check risk gate — what stops a bad signal from becoming a bad trade.
- Backtesting guide — test any local model on historical data before you go live.