PR #14800 CUDA: add fused rms norm

CUDA: add fused rms norm #14800

am17an merged 7 commits into ggml-org:master from am17an:cuda_fused_rms_norm

am17an requested a review from

JohannesGaessler 240 days ago

github-actions added Nvidia GPU

github-actions added ggml

JohannesGaessler commented on 2025-07-21

JohannesGaessler approved these changes on 2025-07-22

JohannesGaessler commented on 2025-07-22

JohannesGaessler commented on 2025-07-22

am17an requested a review from

JohannesGaessler 239 days ago

github-actions added testing

CISC commented on 2025-07-22

JohannesGaessler commented on 2025-07-22

CUDA: add fused rms norm

b41ea163

assume mul_ptr is not null when calling fused ops, formatting changes

a8b1b872

Replace mul_ptr with mul

0c6d097a

Use mul tensor for broadcast

db341d2a

Add testcase about the broadcast

f38c610c

Fix test print

2ebe86ac

Fix condition for broadcast

ed9f84e2

am17an force pushed from 757b81ce to ed9f84e2 239 days ago

JohannesGaessler approved these changes on 2025-07-22

am17an merged 8c988fa4 into master 238 days ago

am17an deleted the cuda_fused_rms_norm branch 238 days ago

Reviewers

JohannesGaessler

jeffbolznv

CISC

Assignees

No one assigned

Labels

testing Nvidia GPU ggml

Milestone

No milestone