Skip to main content
Run AI models entirely on your hardware with Ollama. No data leaves your device.

Why Ollama?

Installation

# Download from ollama.ai
curl -fsSL https://ollama.ai/install.sh | sh

# Or use Homebrew
brew install ollama

Quick Start

1
Start Ollama
ollama serve
Ollama runs on http://localhost:11434 by default.
2
Pull a Model
# Recommended for most users
ollama pull llama3.2

# For coding tasks
ollama pull codellama

# For fast responses
ollama pull phi3
3
Configure AeonSageAeonSage auto-detects Ollama. No additional configuration needed!Or specify explicitly:
{
  "providers": {
    "ollama": {
      "baseUrl": "http://localhost:11434"
    }
  },
  "llm": {
    "defaultProvider": "ollama",
    "defaultModel": "llama3.2"
  }
}
4
Start Using
aeonsage gateway start
Messages will now use your local Ollama model.

Available Models

ModelSizeRAMUse Case
llama3.22-3GB8GBGeneral purpose, best balance
llama3.2:1b1.3GB4GBFast, lightweight
llama3.1:8b4.7GB8GBHigh quality, general purpose
codellama3.8GB8GBCode generation
mistral4.1GB8GBEfficient, multilingual
qwen2.54.7GB8GBStrong reasoning
phi32.3GB6GBFast, efficient

Browse All Models

# List installed models
ollama list

# Search for models
ollama search llama

# Pull specific version
ollama pull llama3.2:3b
See ollama.ai/library for all available models.

Hardware Requirements

Model SizeMinimum RAMRecommended RAM
1-3B4GB8GB
7-8B8GB16GB
13-14B16GB32GB
30B+32GB64GB+
Ollama automatically uses GPU if available (NVIDIA, AMD, Apple Silicon). This significantly improves speed.

Configuration Options

Basic Configuration

{
  "providers": {
    "ollama": {
      "baseUrl": "http://localhost:11434",
      "timeout": 120000,
      "keepAlive": "5m"
    }
  }
}

Model Parameters

{
  "providers": {
    "ollama": {
      "modelOptions": {
        "temperature": 0.7,
        "top_p": 0.9,
        "num_ctx": 4096,
        "num_predict": 2048
      }
    }
  }
}

Remote Ollama

Connect to Ollama on a different machine:
{
  "providers": {
    "ollama": {
      "baseUrl": "http://192.168.1.100:11434"
    }
  }
}
When exposing Ollama over network, ensure proper firewall rules. Ollama has no built-in authentication.

Performance Tuning

Context Length

Adjust context window for your needs:
# Default is 2048 tokens
ollama run llama3.2 --num-ctx 4096
Or in configuration:
{
  "providers": {
    "ollama": {
      "modelOptions": {
        "num_ctx": 8192
      }
    }
  }
}

Parallel Requests

Ollama handles one request at a time by default. For concurrent requests:
OLLAMA_NUM_PARALLEL=4 ollama serve

GPU Layers

Control how many layers run on GPU:
OLLAMA_NUM_GPU=40 ollama run llama3.2

Troubleshooting


Next Steps

You now have Ollama configured for local AI processing. In addition to local models, AeonSage supports cloud AI providers like OpenAI and Anthropic for different use cases. Explore the providers documentation to see all supported models and their capabilities, including specialized models for coding, reasoning, and multilingual tasks.