llama.cpp
vulkan: optimize rms_norm, and allow the work to spread across multiple SMs
#15281
Merged

vulkan: optimize rms_norm, and allow the work to spread across multiple SMs #15281

jeffbolznv
jeffbolznv jeffbolznv requested a review from 0cc4m 0cc4m 30 days ago
github-actions github-actions added testing
github-actions github-actions added Vulkan
github-actions github-actions added ggml
jeffbolznv jeffbolznv marked this pull request as draft 30 days ago
jeffbolznv
jeffbolznv
jeffbolznv commented on 2025-08-13
jeffbolznv jeffbolznv force pushed from e0b01dbb to 075dac28 29 days ago
jeffbolznv jeffbolznv force pushed from 075dac28 to c5236362 26 days ago
jeffbolznv jeffbolznv marked this pull request as ready for review 26 days ago
jeffbolznv
0cc4m
jeffbolznv
0cc4m
0cc4m
0cc4m commented on 2025-08-17
jeffbolznv
characharm
jeffbolznv jeffbolznv force pushed from 7658305a to cd20ef00 21 days ago
jeffbolznv jeffbolznv requested a review from 0cc4m 0cc4m 21 days ago
0cc4m
0cc4m
0cc4m approved these changes on 2025-08-23
jeffbolznv vulkan: optimize rms_norm, and allow the work to spread across multip…
b26cf611
jeffbolznv Change add+rms_norm optimization to write out an array of partial sums
5643b4a3
jeffbolznv complete rebase against fused adds - multi_add shader can also comput…
7856a7a8
jeffbolznv fix validation errors
a675d0c3
jeffbolznv disable add_rms_fusion for Intel due to possible driver bug
8d382bcb
jeffbolznv resolve against #15489, sync after clearing partial sums
e97e226a
jeffbolznv jeffbolznv force pushed from cd20ef00 to e97e226a 19 days ago
jeffbolznv jeffbolznv merged 611f419c into master 19 days ago
CISC

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone