Cuda Trending

Trending Cuda repos on GitHub · last 7 days

karpathy
karpathy /

llm.c

#7

LLM training in simple, raw C/CUDA

30,1073,623+9
Cuda
alibaba
alibaba /

rtp-llm

#1

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

1,178204+8
Cuda
gptinferencellamallmllm-serving
deepseek-ai
deepseek-ai /

DeepGEMM

#2

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

7,3301,019+6
Cuda
NVIDIA
NVIDIA /

cuopt

#14

GPU accelerated decision optimization

918184+5
Cuda
cudagpulinear-programmingoptimization
thu-ml
thu-ml /

SageAttention

#6

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

3,407425+4
Cuda
attentioncudaefficient-attentioninference-accelerationllm
deepseek-ai
deepseek-ai /

DeepEP

#4

DeepEP: an efficient expert-parallel communication library

9,6941,276+4
Cuda
mirage-project
mirage-project /

mirage

#15

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

2,289214+2
Cuda
brucefan1983
brucefan1983 /

GPUMD

#3

Graphics Processing Units Molecular Dynamics

782186+2
Cuda
cudagpugpumdheat-transporthigh-performance-computing
NVlabs
NVlabs /

instant-ngp

#13

Instant neural graphics primitives: lightning fast NeRF and more

17,4142,065+1
Cuda
3d-reconstructioncomputer-graphicscomputer-visioncudafunction-approximation
HenryHuYu
HenryHuYu /

DiffPhysDrone

#10

Published on Nature Machine Intelligence! The first real robot(quadrotor) based on differentiable physics training.

55982+1
Cuda
droneend-to-endreinforcement-learningrobotics
Dao-AILab
Dao-AILab /

causal-conv1d

#9

Causal depthwise conv1d in CUDA, with a PyTorch interface

893188+1
Cuda
NVIDIA
NVIDIA /

CUDALibrarySamples

#5

CUDA Library Samples

2,423459+1
Cuda
cudacudsscufftcurandcusolver
HazyResearch
HazyResearch /

ThunderKittens

#12

Tile primitives for speedy kernels

3,405290
Cuda
rapidsai
rapidsai /

cugraph

#11

cuGraph - RAPIDS Graph Analytics Library

2,187357
Cuda
complex-networkscudagpugraphgraph-algorithms
NVIDIA
NVIDIA /

nccl-tests

#8

NCCL Tests

1,539375
Cuda
alibaba
alibaba /

rtp-llm

#4

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

1,170204+16
Cuda
gptinferencellamallmllm-serving
karpathy
karpathy /

llm.c

#3

LLM training in simple, raw C/CUDA

30,1033,622+13
Cuda
deepseek-ai
deepseek-ai /

DeepGEMM

#1

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

7,3251,014+6
Cuda
HazyResearch
HazyResearch /

ThunderKittens

#12

Tile primitives for speedy kernels

3,405290+4
Cuda
deepseek-ai
deepseek-ai /

DeepEP

#6

DeepEP: an efficient expert-parallel communication library

9,6931,273+4
Cuda
NVIDIA
NVIDIA /

CUDALibrarySamples

#10

CUDA Library Samples

2,422458+3
Cuda
cudacudsscufftcurandcusolver
thu-ml
thu-ml /

SageAttention

#9

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

3,403425+3
Cuda
attentioncudaefficient-attentioninference-accelerationllm
NVlabs
NVlabs /

instant-ngp

#13

Instant neural graphics primitives: lightning fast NeRF and more

17,4142,065+2
Cuda
3d-reconstructioncomputer-graphicscomputer-visioncudafunction-approximation
Dao-AILab
Dao-AILab /

causal-conv1d

#11

Causal depthwise conv1d in CUDA, with a PyTorch interface

892188+2
Cuda
rapidsai
rapidsai /

cuvs

#8

cuVS - a library for vector search and clustering on the GPU

772192+2
Cuda
annsclusteringcudadistancegpu
NVIDIA
NVIDIA /

nccl-tests

#7

NCCL Tests

1,539375+2
Cuda
NVIDIA
NVIDIA /

cuopt

#2

GPU accelerated decision optimization

915183+2
Cuda
cudagpulinear-programmingoptimization
brucefan1983
brucefan1983 /

GPUMD

#5

Graphics Processing Units Molecular Dynamics

780186+1
Cuda
cudagpugpumdheat-transporthigh-performance-computing

Other Languages