CUDA: add fused rms norm #14800
CISC
commented
on 2025-07-22
CUDA: add fused rms norm
b41ea163
assume mul_ptr is not null when calling fused ops, formatting changes
a8b1b872
Replace mul_ptr with mul
0c6d097a
Use mul tensor for broadcast
db341d2a
Add testcase about the broadcast
f38c610c
Fix test print
2ebe86ac
Fix condition for broadcast
ed9f84e2
am17an
force pushed
to
ed9f84e2
143 days ago
am17an
merged
8c988fa4
into master 143 days ago
am17an
deleted the cuda_fused_rms_norm branch 143 days ago
Assignees
No one assigned
Labels
testing
Nvidia GPU
ggml
Login to write a write a comment.
Login via GitHub