llama.cpp
c74759a2 - vulkan: Use cm2 decode_vector for mul_mat_id B matrix loads (#23991)

Commit

42 days ago

vulkan: Use cm2 decode_vector for mul_mat_id B matrix loads (#23991) This allows vec4 loads of the B elements. Also increase BK to 64 when this is enabled. Neither of these alone is consistently faster, but together these give a nice speedup. In ggml-vulkan.cpp, we need to make sure the B matrix alignment and stride are multiples of 4.

References

#23991 - vulkan: Use cm2 decode_vector for mul_mat_id B matrix loads

Author

jeffbolznv

Parents

0f7fada5

llama.cpp c74759a2 - vulkan: Use cm2 decode_vector for mul_mat_id B matrix loads (#23991)

llama.cpp
c74759a2 - vulkan: Use cm2 decode_vector for mul_mat_id B matrix loads (#23991)