llama.cpp
7ac89021 - vulkan: optimize decodeFuncB in coopmat2 mul_mat_id shader (#18349)

Commit
8 days ago
vulkan: optimize decodeFuncB in coopmat2 mul_mat_id shader (#18349) * vulkan: Use BK=32 for coopmat2 mul_mat_id * vulkan: optimize decodeFuncB in coopmat2 mul_mat_id shader Disable robustness, remove the OOB check in decodeFuncB, and initialize the row_ids to zero to avoid OOB access. Don't slice/offset the B matrix to ic * BN, only to adjust the coord back down to the range [0, BN) in decodeFuncB. Instead just slice with a row offset of zero and remove the '& (BN - 1)'. This allows the compiler to common some of the shared memory loads.
Author
Parents
Loading