Ollama (Local Models)

Run AI models entirely on your hardware with Ollama. No data leaves your device.

Why Ollama?

Installation

macOS
Linux
Windows

# Download from ollama.ai
curl -fsSL https://ollama.ai/install.sh | sh

# Or use Homebrew
brew install ollama

curl -fsSL https://ollama.ai/install.sh | sh

Quick Start

Start Ollama

ollama serve

Ollama runs on http://localhost:11434 by default.

Pull a Model

# Recommended for most users
ollama pull llama3.2

# For coding tasks
ollama pull codellama

# For fast responses
ollama pull phi3

Configure AeonSageAeonSage auto-detects Ollama. No additional configuration needed!Or specify explicitly:

{
  "providers": {
    "ollama": {
      "baseUrl": "http://localhost:11434"
    }
  },
  "llm": {
    "defaultProvider": "ollama",
    "defaultModel": "llama3.2"
  }
}

Start Using

aeonsage gateway start

Messages will now use your local Ollama model.

Available Models

Recommended Models

Model	Size	RAM	Use Case
llama3.2	2-3GB	8GB	General purpose, best balance
llama3.2:1b	1.3GB	4GB	Fast, lightweight
llama3.1:8b	4.7GB	8GB	High quality, general purpose
codellama	3.8GB	8GB	Code generation
mistral	4.1GB	8GB	Efficient, multilingual
qwen2.5	4.7GB	8GB	Strong reasoning
phi3	2.3GB	6GB	Fast, efficient

Browse All Models

# List installed models
ollama list

# Search for models
ollama search llama

# Pull specific version
ollama pull llama3.2:3b

See ollama.ai/library for all available models.

Hardware Requirements

Model Size	Minimum RAM	Recommended RAM
1-3B	4GB	8GB
7-8B	8GB	16GB
13-14B	16GB	32GB
30B+	32GB	64GB+

Ollama automatically uses GPU if available (NVIDIA, AMD, Apple Silicon). This significantly improves speed.

Configuration Options

Basic Configuration

{
  "providers": {
    "ollama": {
      "baseUrl": "http://localhost:11434",
      "timeout": 120000,
      "keepAlive": "5m"
    }
  }
}

Model Parameters

{
  "providers": {
    "ollama": {
      "modelOptions": {
        "temperature": 0.7,
        "top_p": 0.9,
        "num_ctx": 4096,
        "num_predict": 2048
      }
    }
  }
}

Remote Ollama

Connect to Ollama on a different machine:

{
  "providers": {
    "ollama": {
      "baseUrl": "http://192.168.1.100:11434"
    }
  }
}

When exposing Ollama over network, ensure proper firewall rules. Ollama has no built-in authentication.

Performance Tuning

Context Length

Adjust context window for your needs:

# Default is 2048 tokens
ollama run llama3.2 --num-ctx 4096

Or in configuration:

{
  "providers": {
    "ollama": {
      "modelOptions": {
        "num_ctx": 8192
      }
    }
  }
}

Parallel Requests

Ollama handles one request at a time by default. For concurrent requests:

OLLAMA_NUM_PARALLEL=4 ollama serve

GPU Layers

Control how many layers run on GPU:

OLLAMA_NUM_GPU=40 ollama run llama3.2

Troubleshooting

Next Steps

You now have Ollama configured for local AI processing. In addition to local models, AeonSage supports cloud AI providers like OpenAI and Anthropic for different use cases. Explore the providers documentation to see all supported models and their capabilities, including specialized models for coding, reasoning, and multilingual tasks.

Get Started

Core Concepts

Channels

Configuration

Desktop

AI Providers

Security

Resources

Ollama (Local Models)

Why Ollama?

Installation

Quick Start

Available Models

Recommended Models

Browse All Models

Hardware Requirements

Configuration Options

Basic Configuration

Model Parameters

Remote Ollama

Performance Tuning

Context Length

Parallel Requests

GPU Layers

Troubleshooting

Next Steps

Get Started

Core Concepts

Channels

Configuration

Desktop

AI Providers

Security

Resources

​Why Ollama?

​Installation

​Quick Start

​Available Models

​Recommended Models

​Browse All Models

​Hardware Requirements

​Configuration Options

​Basic Configuration

​Model Parameters

​Remote Ollama

​Performance Tuning

​Context Length

​Parallel Requests

​GPU Layers

​Troubleshooting

​Next Steps

Why Ollama?

Installation

Quick Start

Available Models

Recommended Models

Browse All Models

Hardware Requirements

Configuration Options

Basic Configuration

Model Parameters

Remote Ollama

Performance Tuning

Context Length

Parallel Requests

GPU Layers

Troubleshooting

Next Steps