thu-ml

thu-ml / SageAttention

#9
3,403425+3 todayCuda

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

📊 Project Info

Language
Cuda
Stars
3,403
Forks
425
Today
+3
Ranking
#9
Collection
Language
Trending Date
June 2, 2026
Last Push
1/17/2026

🏷️ Topics

attentioncudaefficient-attentioninference-accelerationllmllm-inframlsysquantizationtritonvideo-generatevideo-generationvit

📸 Screenshots

SageAttention screenshot 1SageAttention screenshot 2SageAttention screenshot 3