vulkan: optimize rms_norm, and allow the work to spread across multiple SMs #15281
jeffbolznv
marked this pull request as draft 30 days ago
jeffbolznv
force pushed
from
e0b01dbb
to
075dac28
29 days ago
jeffbolznv
force pushed
from
075dac28
to
c5236362
26 days ago
jeffbolznv
marked this pull request as ready for review 26 days ago
0cc4m
commented
on 2025-08-17
jeffbolznv
force pushed
from
7658305a
to
cd20ef00
21 days ago
0cc4m
approved these changes
on 2025-08-23
vulkan: optimize rms_norm, and allow the work to spread across multip…
b26cf611
Change add+rms_norm optimization to write out an array of partial sums
5643b4a3
complete rebase against fused adds - multi_add shader can also comput…
7856a7a8
fix validation errors
a675d0c3
disable add_rms_fusion for Intel due to possible driver bug
8d382bcb
resolve against #15489, sync after clearing partial sums
e97e226a
jeffbolznv
force pushed
from
cd20ef00
to
e97e226a
19 days ago
jeffbolznv
merged
611f419c
into master 19 days ago
Assignees
No one assigned
Labels
testing
Vulkan
ggml
Login to write a write a comment.
Login via GitHub