llama.cpp
18ddaea2 - vulkan: Optimize GGML_OP_CUMSUM (#18417)

Commit

137 days ago

vulkan: Optimize GGML_OP_CUMSUM (#18417) * vulkan: Optimize GGML_OP_CUMSUM There are two paths: The preexisting one that does a whole row per workgroup in a single shader, and one that splits each row into multiple blocks and does two passes. The first pass computes partials within a block, the second adds the block partials to compute the final result. The multipass shader is used when there are a small number of large rows. In the whole-row shader, handle multiple elements per invocation. * use 2 ELEM_PER_THREAD for AMD/Intel * address feedback

References

#18417 - vulkan: Optimize GGML_OP_CUMSUM

Author

jeffbolznv

Parents

706e3f93

llama.cpp 18ddaea2 - vulkan: Optimize GGML_OP_CUMSUM (#18417)

llama.cpp
18ddaea2 - vulkan: Optimize GGML_OP_CUMSUM (#18417)