LocalAI is a self-hosted OpenAI-compatible API. Your existing OpenAI code works with local models - just change the URL.
Quick Deploy
```bash # Docker (easiest) docker run -p 8080:8080 localai/localai
# Or with specific model docker run -p 8080:8080 \ -v $PWD/models:/models \ localai/localai ```
Use with OpenAI SDK
```python import openai
# Just change the base URL! openai.api_base = "http://localhost:8080/v1" openai.api_key = "not-needed"
response = openai.ChatCompletion.create( model="llama2", messages=[{"role": "user", "content": "Hello!"}] ) print(response.choices[0].message.content) ```
Use Cases
Cost Savings: Stop paying for API calls. Run unlimited queries locally.
Privacy: Keep sensitive data on-premise.
Development: Test AI features without burning credits.
Offline Apps: Build apps that work without internet.
Custom Models: Use fine-tuned models with familiar APIs.
Enterprise: Self-hosted AI for compliance requirements.
Supported APIs
- Chat completions (/v1/chat/completions) - Completions (/v1/completions) - Embeddings (/v1/embeddings) - Image generation (/v1/images/generations) - Audio transcription (/v1/audio/transcriptions)
Compatible Models
- LLaMA, LLaMA 2 - Mistral, Mixtral - Falcon - GPT4All models - Stable Diffusion (for images) - Whisper (for audio)
Pro Tips
- Use GPU for 10x+ speedup - Quantized models use less RAM - Multiple models can run simultaneously - Check model compatibility before downloading