vLLM
🌍 International 📖 Open Source ✅ Free
UC Berkeley open-source high-throughput LLM inference engine with PagedAttention. Self-host any open-source model and expose an OpenAI-compatible API.
🎁 Free Tier
Daily Limit: Apache-2.0 open-source.
| Model | Context | Limit | Notes |
|---|---|---|---|
| OpenAI-compatible server | Depends on the model you serve | Hardware-bound | vLLM is an inference engine, not a hosted quota product; you serve whatever model you deploy. |
🔑 Free API
Free Credits: Self-hosted OpenAI-compatible API; no vendor credits required.
Rate Limit: Hardware-bound; depends on GPU memory, model size, and concurrency.
vLLM can turn open models into an OpenAI-compatible API for private deployments, lower-cost inference, and high throughput.