llama.cpp
CUDA: add fused rms norm
#14800
Merged

CUDA: add fused rms norm #14800

am17an
am17an am17an requested a review from JohannesGaessler JohannesGaessler 144 days ago
github-actions github-actions added Nvidia GPU
github-actions github-actions added ggml
JohannesGaessler
JohannesGaessler commented on 2025-07-21
JohannesGaessler
JohannesGaessler approved these changes on 2025-07-22
JohannesGaessler
JohannesGaessler
JohannesGaessler commented on 2025-07-22
exxocism
am17an
JohannesGaessler
JohannesGaessler
JohannesGaessler
am17an
JohannesGaessler
JohannesGaessler
JohannesGaessler commented on 2025-07-22
am17an
exxocism
exxocism
JohannesGaessler
am17an am17an requested a review from JohannesGaessler JohannesGaessler 143 days ago
github-actions github-actions added testing
CISC
CISC commented on 2025-07-22
JohannesGaessler
JohannesGaessler commented on 2025-07-22
JohannesGaessler
JohannesGaessler commented on 2025-07-22
JohannesGaessler
JohannesGaessler commented on 2025-07-22
am17an
am17an CUDA: add fused rms norm
b41ea163
am17an assume mul_ptr is not null when calling fused ops, formatting changes
a8b1b872
am17an Replace mul_ptr with mul
0c6d097a
am17an Use mul tensor for broadcast
db341d2a
am17an Add testcase about the broadcast
f38c610c
am17an Fix test print
2ebe86ac
am17an Fix condition for broadcast
ed9f84e2
am17an am17an force pushed to ed9f84e2 143 days ago
JohannesGaessler
JohannesGaessler approved these changes on 2025-07-22
am17an am17an merged 8c988fa4 into master 143 days ago
am17an am17an deleted the cuda_fused_rms_norm branch 143 days ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone