CUDA: Improve flash decoding kernel GPU occupancy for BS=1 case #12183
gaugarg-nv
changed the title CUDA: Improve flash decoding kernel occupancy for BS=1 case CUDA: Improve flash decoding kernel GPU occupancy for BS=1 case 284 days ago
CUDA: determine FA parallel blocks at runtime
2d011e65
CUDA: Improve flash decoding kernel occupancy for BS=1 case
aa5aa01d
gaugarg-nv
force pushed
from
b6e067b9
to
aa5aa01d
270 days ago
consider tail effects for parallel_blocks
66d873b6
IMbackK
approved these changes
on 2025-03-19
ggerganov
approved these changes
on 2025-03-19
IMbackK
merged
517b5ddb
into master 270 days ago
gaugarg-nv
deleted the flash_decoding_improvement branch 269 days ago
Assignees
No one assigned
Labels
testing
Nvidia GPU
ggml
Login to write a write a comment.
Login via GitHub