Deploying Models
Using Hosted LLM
- Hosted Playgrounds
- Hugging face spaces
- LM Arena
- Groq Chat
- Google Colab
- Kaggle Notebooks
- OpenAI Playground
- Anthropic Console
- Google AI Studio
- Managed Inference (Hosted Models)
- Open Source
- Groq
- Fireworks AI
- HuggingFace Inference API
- Open Source + Proprietary
- Vendor Specific
- OpenAI API
- Anthropic API
- Google Gemini API
- Mistral API
- Managed GenAI Platform
- Azure AI Foundry
- Google Vertex AI
- Amazon Bedrock
Deploying Open Source LLM
- https://www.youtube.com/watch?v=vehYE1DfkZg
- Local
- Personal Use
- Ollama
- llama.cpp
- LM Studio
- Production
- Expose to Internet
- CloudFlare Tunnel
- Tailscale
- Nginx
- VPS (Virtual Private Server)
- Workflows
- Run your (apps + model) on VPS
- Run your app on VPS + Model on Local
- Primarily CPU VPS
- Hetzner: Raw computing power (CPU/GPU) for price with their own datacenters
- Hostinger: Beginner friendly
- DigitalOcean: Has 1-click apps and managed K8s
- GPU VPS
- Vast.ai: GPU Marketplace
- Runpod.io
- Jarvislabs.ai
- Edge Devices
- LiteRT-LM (supports Metal GPU Acceleration)
- llama.cpp (supports Metal GPU Acceleration)
- Apple MLX (iPhones only)