Ollama is the easiest way to run large language models on your own machine. No cloud, no API keys, no BS.
Quick Deploy
```bash # macOS/Linux - one line install curl -fsSL https://ollama.com/install.sh | sh
# Windows - download installer from ollama.com
# Pull and run a model ollama run llama2 ollama run mistral ollama run codellama ```
API Usage
```bash # Ollama runs a local API server curl http://localhost:11434/api/generate -d '{ "model": "llama2", "prompt": "Why is the sky blue?" }' ```
Python Integration
```python import requests
response = requests.post('http://localhost:11434/api/generate', json={ 'model': 'llama2', 'prompt': 'Write a haiku about coding' }) print(response.json()['response']) ```
Use Cases
Private AI Assistant: Run ChatGPT-like conversations without sending data to the cloud. Perfect for sensitive work.
Local Development: Test AI features without burning API credits.
Offline Coding Help: Get code suggestions even without internet.
Custom Chatbots: Build AI apps that run entirely on-premise.
Learning & Experimentation: Try different models without cost.
Popular Models
- `llama2` - Meta's general purpose model - `mistral` - Fast and capable - `codellama` - Optimized for code - `vicuna` - Great for chat - `neural-chat` - Intel's optimized model
Hardware
- 7B models: 8GB RAM minimum - 13B models: 16GB RAM - 70B models: 64GB+ RAM