CUDA: Improve flash decoding kernel GPU occupancy for BS=1 case #12183
gaugarg-nv
force pushed
from
bb46418c
to
be39646a
247 days ago
gaugarg-nv
force pushed
from
be39646a
to
76881ac5
247 days ago
gaugarg-nv
changed the title CUDA: Improve flash decoding kernel occupancy for BS=1 case CUDA: Improve flash decoding kernel GPU occupancy for BS=1 case 246 days ago
gaugarg-nv
force pushed
from
a8a71758
to
d83b0d07
244 days ago
gaugarg-nv
force pushed
from
d83b0d07
to
b6e067b9
235 days ago
CUDA: determine FA parallel blocks at runtime
2d011e65
CUDA: Improve flash decoding kernel occupancy for BS=1 case
aa5aa01d
gaugarg-nv
force pushed
from
b6e067b9
to
aa5aa01d
232 days ago
consider tail effects for parallel_blocks
66d873b6
IMbackK
approved these changes
on 2025-03-19
ggerganov
approved these changes
on 2025-03-19
IMbackK
merged
517b5ddb
into master 232 days ago
gaugarg-nv
deleted the flash_decoding_improvement branch 232 days ago
Assignees
No one assigned
Labels
testing
Nvidia GPU
ggml
Login to write a write a comment.
Login via GitHub