A high-throughput and memory-efficient inference and serving engine for LLMs
📊 Project Info
- Language
- Python
- Stars
- ⭐ 81,183
- Forks
- 17,321
- Today
- +121
- Ranking
- #8
- Collection
- Language
- Trending Date
- May 27, 2026
- Last Push
- 5/27/2026
🏷️ Topics
amdblackwellcudadeepseekdeepseek-v3gptgpt-ossinferencekimillamallmllm-servingmodel-servingmoeopenaipytorchqwenqwen3tputransformer