vulkan: scale caching for k quants + misc fixes #11081
q6_k scale caching
d122d5c9
16 bit unpack
6b06d168
q4_k test (slow)
21c6b805
revert it
b0e4ccbe
q3_k
07d0d58b
q2_k
d70a7316
little stuff
c01ccf82
0cc4m
commented
on 2025-01-05
try precalculating products of a and q2_k scales
bdd98c74
Revert "try precalculating products of a and q2_k scales"
17307718
unpack should be u16, add vim swap to gitignore (about time)
b4ae7005
better q4_k scales
cdf70cf2
q5_k
6f5d62b0
better q6_k with separate paths for all threads and partial threads i…
91f1d9ce
q2_k better dequant
cc28742c
q3_k optimizations
fe71a8c4
q3_k use hmask simd from cpu avx version
923e9a83
Merge https://github.com/ggerganov/llama.cpp into vulkan
c9463641
make the caches happy
51b5ac50
q3_k separate out calculation
973bc406
q2_k separate out
6145fc79
little stuff
845d572b
merge master
d63497b3
use calc_superblock everywhere
30eacad2
q2_k optimize scale calculation
ed1ad94c
more barriers
4ae3fc01
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub