Deploying Models

Using Hosted LLM

  • Hosted Playgrounds
    • Hugging face spaces
    • LM Arena
    • Groq Chat
    • Google Colab
    • Kaggle Notebooks
    • OpenAI Playground
    • Anthropic Console
    • Google AI Studio
  • Managed Inference (Hosted Models)
    • Open Source
      • Groq
      • Fireworks AI
      • HuggingFace Inference API
    • Open Source + Proprietary
      • Replicate
      • OpenRouter
    • Vendor Specific
      • OpenAI API
      • Anthropic API
      • Google Gemini API
      • Mistral API
  • Managed GenAI Platform
    • Azure AI Foundry
    • Google Vertex AI
    • Amazon Bedrock

Deploying Open Source LLM

  • https://www.youtube.com/watch?v=vehYE1DfkZg
  • Local
    • Personal Use
      • Ollama
      • llama.cpp
      • LM Studio
    • Production
      • vLLM
      • TGI
      • SGLang
    • Expose to Internet
      • CloudFlare Tunnel
      • Tailscale
      • Nginx
  • VPS (Virtual Private Server)
    • Workflows
      • Run your (apps + model) on VPS
      • Run your app on VPS + Model on Local
    • Primarily CPU VPS
      • Hetzner: Raw computing power (CPU/GPU) for price with their own datacenters
      • Hostinger: Beginner friendly
      • DigitalOcean: Has 1-click apps and managed K8s
    • GPU VPS
      • Vast.ai: GPU Marketplace
      • Runpod.io
      • Jarvislabs.ai
  • Edge Devices
    • LiteRT-LM (supports Metal GPU Acceleration)
    • llama.cpp (supports Metal GPU Acceleration)
    • Apple MLX (iPhones only)