llama.cpp
61bde8e2 - vulkan: Reduce temporary memory usage for TOP_K (#17623)

Commit
13 days ago
vulkan: Reduce temporary memory usage for TOP_K (#17623) - Compute row size for the temp buffer based on the output of the first pass. - Update shader addressing math to use the output row size - Pass the output row size as "ncols_output", what used to be "ncols_output" is now "k" For the common case of K=40 and src0=(200000,1,1,1), this reduces the temporary buffer from about 3.2MB to 500KB.
Author
Parents
Loading