llama.cpp
CUDA: fuse rope + set_rows
#16884
Merged

CUDA: fuse rope + set_rows #16884

am17an
am17an am17an requested a review from CISC CISC 47 days ago
am17an am17an requested a review from slaren slaren 47 days ago
github-actions github-actions added Nvidia GPU
github-actions github-actions added ggml
am17an
am17an commented on 2025-10-31
am17an am17an requested a review from JohannesGaessler JohannesGaessler 47 days ago
am17an
am17an commented on 2025-10-31
am17an am17an force pushed 47 days ago
ORippler
ORippler commented on 2025-10-31
am17an
JohannesGaessler
JohannesGaessler commented on 2025-11-02
am17an CUDA: add fused rope
acfd03d3
am17an move k forward_expand up
b3761df3
am17an create helper function instead of re-using params
67b6580b
am17an make assert statement more in line with comment
c7c3b9f1
am17an rope_norm: coalesced writes to global mem
67624935
am17an am17an force pushed to 67624935 35 days ago
am17an am17an requested a review from JohannesGaessler JohannesGaessler 35 days ago
JohannesGaessler
JohannesGaessler approved these changes on 2025-11-12
JohannesGaessler
am17an
am17an am17an merged a90eb94c into master 34 days ago
am17an am17an deleted the cuda-add-rope-fusion branch 34 days ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone