llama.cpp
e06c3ab2 - vulkan: change gated_delta_net to shard a column across a subgroup (#20662)

Commit

53 days ago

vulkan: change gated_delta_net to shard a column across a subgroup (#20662) * vulkan: change gated_delta_net to shard a column across a subgroup This is based on https://github.com/ggml-org/llama.cpp/pull/20391, I used an LLM to port the CUDA code to Vulkan, and guided to it to make various fixes to work with Vulkan (e.g. handling different subgroup sizes, unknown mapping of subgroup to invocation id, using subgroupAdd optionally, etc.). This fixes a perf regression from the transposing of the values in memory (!20443). * vulkan: Spread columns across fewer lanes to reduce the number of workgroups

References

#20662 - vulkan: change gated_delta_net to shard a column across a subgroup

Author

jeffbolznv

Parents

dc659243

llama.cpp e06c3ab2 - vulkan: change gated_delta_net to shard a column across a subgroup (#20662)

llama.cpp
e06c3ab2 - vulkan: change gated_delta_net to shard a column across a subgroup (#20662)