llama.cpp
e06c3ab2 - vulkan: change gated_delta_net to shard a column across a subgroup (#20662)

Commit
13 days ago
vulkan: change gated_delta_net to shard a column across a subgroup (#20662) * vulkan: change gated_delta_net to shard a column across a subgroup This is based on https://github.com/ggml-org/llama.cpp/pull/20391, I used an LLM to port the CUDA code to Vulkan, and guided to it to make various fixes to work with Vulkan (e.g. handling different subgroup sizes, unknown mapping of subgroup to invocation id, using subgroupAdd optionally, etc.). This fixes a perf regression from the transposing of the values in memory (!20443). * vulkan: Spread columns across fewer lanes to reduce the number of workgroups
Author
Parents
Loading