llama.cpp
vulkan: optimize rms_norm, and allow the work to spread across multiple SMs
#15281
Merged

vulkan: optimize rms_norm, and allow the work to spread across multiple SMs #15281

jeffbolznv
jeffbolznv jeffbolznv requested a review from 0cc4m 0cc4m 155 days ago
github-actions github-actions added testing
github-actions github-actions added Vulkan
github-actions github-actions added ggml
jeffbolznv jeffbolznv marked this pull request as draft 155 days ago
jeffbolznv
jeffbolznv
jeffbolznv commented on 2025-08-13
jeffbolznv jeffbolznv force pushed 154 days ago
jeffbolznv jeffbolznv force pushed 151 days ago
jeffbolznv jeffbolznv marked this pull request as ready for review 151 days ago
jeffbolznv
0cc4m
jeffbolznv
0cc4m
0cc4m
0cc4m commented on 2025-08-17
jeffbolznv
characharm
jeffbolznv jeffbolznv force pushed 146 days ago
jeffbolznv jeffbolznv requested a review from 0cc4m 0cc4m 146 days ago
0cc4m
0cc4m
0cc4m approved these changes on 2025-08-23
jeffbolznv vulkan: optimize rms_norm, and allow the work to spread across multip…
b26cf611
jeffbolznv Change add+rms_norm optimization to write out an array of partial sums
5643b4a3
jeffbolznv complete rebase against fused adds - multi_add shader can also comput…
7856a7a8
jeffbolznv fix validation errors
a675d0c3
jeffbolznv disable add_rms_fusion for Intel due to possible driver bug
8d382bcb
jeffbolznv resolve against #15489, sync after clearing partial sums
e97e226a
jeffbolznv jeffbolznv force pushed to e97e226a 144 days ago
jeffbolznv jeffbolznv merged 611f419c into master 144 days ago
CISC

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone