[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
📊 Project Info
- Language
- Cuda
- Stars
- ⭐ 3,397
- Forks
- 426
- Today
- +2
- Ranking
- #7
- Collection
- Language
- Trending Date
- May 31, 2026
- Last Push
- 1/17/2026
🏷️ Topics
attentioncudaefficient-attentioninference-accelerationllmllm-inframlsysquantizationtritonvideo-generatevideo-generationvit


