llama.cpp
CUDA: Improve flash decoding kernel GPU occupancy for BS=1 case
#12183
Merged

CUDA: Improve flash decoding kernel GPU occupancy for BS=1 case #12183

gaugarg-nv
gaugarg-nv gaugarg-nv requested a review from JohannesGaessler JohannesGaessler 285 days ago
github-actions github-actions added Nvidia GPU
github-actions github-actions added ggml
gaugarg-nv gaugarg-nv force pushed 285 days ago
gaugarg-nv gaugarg-nv force pushed 285 days ago
gaugarg-nv gaugarg-nv changed the title CUDA: Improve flash decoding kernel occupancy for BS=1 case CUDA: Improve flash decoding kernel GPU occupancy for BS=1 case 284 days ago
gaugarg-nv
ggerganov
ggerganov commented on 2025-03-06
gaugarg-nv
JohannesGaessler
JohannesGaessler commented on 2025-03-06
gaugarg-nv
JohannesGaessler
IMbackK
gaugarg-nv gaugarg-nv force pushed 282 days ago
github-actions github-actions added testing
gaugarg-nv
gaugarg-nv
JohannesGaessler
Beinsezii
gaugarg-nv
Beinsezii
JohannesGaessler
JohannesGaessler
gaugarg-nv gaugarg-nv force pushed to b6e067b9 273 days ago
gaugarg-nv
gaugarg-nv
JohannesGaessler CUDA: determine FA parallel blocks at runtime
2d011e65
gaugarg-nv CUDA: Improve flash decoding kernel occupancy for BS=1 case
aa5aa01d
gaugarg-nv gaugarg-nv force pushed from b6e067b9 to aa5aa01d 270 days ago
gaugarg-nv
JohannesGaessler
gaugarg-nv
JohannesGaessler
gaugarg-nv
JohannesGaessler consider tail effects for parallel_blocks
66d873b6
JohannesGaessler
Beinsezii
IMbackK
ggerganov
JohannesGaessler
JohannesGaessler
JohannesGaessler approved these changes on 2025-03-19
IMbackK
Beinsezii
IMbackK
IMbackK approved these changes on 2025-03-19
ggerganov
ggerganov approved these changes on 2025-03-19
IMbackK IMbackK merged 517b5ddb into master 270 days ago
gaugarg-nv gaugarg-nv deleted the flash_decoding_improvement branch 269 days ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone