vulkan: optimize rms_norm, and allow the work to spread across multiple SMs #15281
jeffbolznv
marked this pull request as draft 155 days ago
jeffbolznv
marked this pull request as ready for review 151 days ago
0cc4m
commented
on 2025-08-17
0cc4m
approved these changes
on 2025-08-23
vulkan: optimize rms_norm, and allow the work to spread across multip…
b26cf611
Change add+rms_norm optimization to write out an array of partial sums
5643b4a3
complete rebase against fused adds - multi_add shader can also comput…
7856a7a8
fix validation errors
a675d0c3
disable add_rms_fusion for Intel due to possible driver bug
8d382bcb
resolve against #15489, sync after clearing partial sums
e97e226a
jeffbolznv
merged
611f419c
into master 144 days ago
Assignees
No one assigned
Labels
testing
Vulkan
ggml
Login to write a write a comment.
Login via GitHub