llama.cpp
c74759a2 - vulkan: Use cm2 decode_vector for mul_mat_id B matrix loads (#23991)

Commit
1 day ago
vulkan: Use cm2 decode_vector for mul_mat_id B matrix loads (#23991) This allows vec4 loads of the B elements. Also increase BK to 64 when this is enabled. Neither of these alone is consistently faster, but together these give a nice speedup. In ggml-vulkan.cpp, we need to make sure the B matrix alignment and stride are multiples of 4.
Author
Parents
Loading