Run AI models entirely on your hardware with Ollama. No data leaves your device.
Why Ollama?
Installation
# Download from ollama.ai
curl -fsSL https://ollama.ai/install.sh | sh
# Or use Homebrew
brew install ollama
curl -fsSL https://ollama.ai/install.sh | sh
Download the installer from ollama.ai
Quick Start
Start Ollama Ollama runs on http://localhost:11434 by default.
Pull a Model # Recommended for most users
ollama pull llama3.2
# For coding tasks
ollama pull codellama
# For fast responses
ollama pull phi3
Configure AeonSage AeonSage auto-detects Ollama. No additional configuration needed! Or specify explicitly: {
"providers" : {
"ollama" : {
"baseUrl" : "http://localhost:11434"
}
},
"llm" : {
"defaultProvider" : "ollama" ,
"defaultModel" : "llama3.2"
}
}
Start Using Messages will now use your local Ollama model.
Available Models
Recommended Models
Model Size RAM Use Case llama3.2 2-3GB 8GB General purpose, best balance llama3.2:1b 1.3GB 4GB Fast, lightweight llama3.1:8b 4.7GB 8GB High quality, general purpose codellama 3.8GB 8GB Code generation mistral 4.1GB 8GB Efficient, multilingual qwen2.5 4.7GB 8GB Strong reasoning phi3 2.3GB 6GB Fast, efficient
Browse All Models
# List installed models
ollama list
# Search for models
ollama search llama
# Pull specific version
ollama pull llama3.2:3b
See ollama.ai/library for all available models.
Hardware Requirements
Model Size Minimum RAM Recommended RAM 1-3B 4GB 8GB 7-8B 8GB 16GB 13-14B 16GB 32GB 30B+ 32GB 64GB+
Ollama automatically uses GPU if available (NVIDIA, AMD, Apple Silicon). This significantly improves speed.
Configuration Options
Basic Configuration
{
"providers" : {
"ollama" : {
"baseUrl" : "http://localhost:11434" ,
"timeout" : 120000 ,
"keepAlive" : "5m"
}
}
}
Model Parameters
{
"providers" : {
"ollama" : {
"modelOptions" : {
"temperature" : 0.7 ,
"top_p" : 0.9 ,
"num_ctx" : 4096 ,
"num_predict" : 2048
}
}
}
}
Remote Ollama
Connect to Ollama on a different machine:
{
"providers" : {
"ollama" : {
"baseUrl" : "http://192.168.1.100:11434"
}
}
}
When exposing Ollama over network, ensure proper firewall rules. Ollama has no built-in authentication.
Context Length
Adjust context window for your needs:
# Default is 2048 tokens
ollama run llama3.2 --num-ctx 4096
Or in configuration:
{
"providers" : {
"ollama" : {
"modelOptions" : {
"num_ctx" : 8192
}
}
}
}
Parallel Requests
Ollama handles one request at a time by default. For concurrent requests:
OLLAMA_NUM_PARALLEL = 4 ollama serve
GPU Layers
Control how many layers run on GPU:
OLLAMA_NUM_GPU = 40 ollama run llama3.2
Troubleshooting
Next Steps
You now have Ollama configured for local AI processing. In addition to local models, AeonSage supports cloud AI providers like OpenAI and Anthropic for different use cases. Explore the providers documentation to see all supported models and their capabilities, including specialized models for coding, reasoning, and multilingual tasks.