llama.cpp
015022bb - vulkan: enable coopmat2 FA gqa and split_k optimizations more often (#12931)

Commit

149 days ago

vulkan: enable coopmat2 FA gqa and split_k optimizations more often (#12931) The grouped query attention optmization doesn't require a power of two ratio, the only thing relying on it was the modulo operation written as bitwise &. split_k need not depend on gqa_ratio - enable it any time there's only one workgroup in the X dimension. The shader gets the split index from the x coord, and multiple workgroups in the X dimension (pre-split) indicates a larger FA operation that wouldn't need splitting.

References

#12931 - vulkan: enable coopmat2 FA gqa and split_k optimizations more often

Author

jeffbolznv

Parents

b43d89e3

llama.cpp 015022bb - vulkan: enable coopmat2 FA gqa and split_k optimizations more often (#12931)

llama.cpp
015022bb - vulkan: enable coopmat2 FA gqa and split_k optimizations more often (#12931)