llama.cpp
vulkan: scale caching for k quants + misc fixes
#11081
Merged

vulkan: scale caching for k quants + misc fixes #11081

netrunnereve merged 25 commits into ggml-org:master from vulkan
netrunnereve
netrunnereve q6_k scale caching
d122d5c9
netrunnereve 16 bit unpack
6b06d168
netrunnereve q4_k test (slow)
21c6b805
netrunnereve revert it
b0e4ccbe
netrunnereve q3_k
07d0d58b
netrunnereve q2_k
d70a7316
github-actions github-actions added Vulkan
github-actions github-actions added ggml
netrunnereve netrunnereve requested a review from 0cc4m 0cc4m 1 year ago
github-actions github-actions added script
github-actions github-actions added python
github-actions github-actions added Apple Metal
netrunnereve little stuff
c01ccf82
netrunnereve netrunnereve removed script
netrunnereve netrunnereve removed python
netrunnereve netrunnereve removed Apple Metal
jeffbolznv jeffbolznv requested a review from jeffbolznv jeffbolznv 1 year ago
0cc4m
0cc4m commented on 2025-01-05
jeffbolznv
jeffbolznv requested changes on 2025-01-05
jeffbolznv
netrunnereve
0cc4m
jeffbolznv
netrunnereve
netrunnereve
jeffbolznv
netrunnereve
netrunnereve netrunnereve requested a review from JohannesGaessler JohannesGaessler 1 year ago
netrunnereve netrunnereve requested a review from ngxson ngxson 1 year ago
github-actions github-actions added script
github-actions github-actions added testing
github-actions github-actions added Nvidia GPU
github-actions github-actions added examples
github-actions github-actions added python
github-actions github-actions added devops
github-actions github-actions added server
github-actions github-actions added SYCL
github-actions github-actions added Apple Metal
netrunnereve try precalculating products of a and q2_k scales
bdd98c74
netrunnereve Revert "try precalculating products of a and q2_k scales"
17307718
netrunnereve unpack should be u16, add vim swap to gitignore (about time)
b4ae7005
netrunnereve better q4_k scales
cdf70cf2
netrunnereve q5_k
6f5d62b0
netrunnereve better q6_k with separate paths for all threads and partial threads i…
91f1d9ce
netrunnereve q2_k better dequant
cc28742c
netrunnereve q3_k optimizations
fe71a8c4
netrunnereve q3_k use hmask simd from cpu avx version
923e9a83
netrunnereve netrunnereve removed script
netrunnereve netrunnereve removed testing
netrunnereve netrunnereve removed Nvidia GPU
netrunnereve netrunnereve removed examples
netrunnereve netrunnereve removed python
netrunnereve netrunnereve removed devops
netrunnereve netrunnereve removed server
netrunnereve netrunnereve removed SYCL
netrunnereve netrunnereve removed Apple Metal
netrunnereve netrunnereve removed review request from ngxson ngxson 1 year ago
netrunnereve netrunnereve removed review request from JohannesGaessler JohannesGaessler 1 year ago
netrunnereve Merge https://github.com/ggerganov/llama.cpp into vulkan
c9463641
0cc4m
netrunnereve make the caches happy
51b5ac50
netrunnereve q3_k separate out calculation
973bc406
netrunnereve q2_k separate out
6145fc79
netrunnereve little stuff
845d572b
netrunnereve
jeffbolznv
jeffbolznv commented on 2025-01-10
netrunnereve merge master
d63497b3
netrunnereve use calc_superblock everywhere
30eacad2
netrunnereve
netrunnereve q2_k optimize scale calculation
ed1ad94c
0cc4m
netrunnereve
jeffbolznv
jeffbolznv
jeffbolznv requested changes on 2025-01-12
netrunnereve more barriers
4ae3fc01
netrunnereve
jeffbolznv
jeffbolznv approved these changes on 2025-01-13
netrunnereve
netrunnereve netrunnereve merged adc5dd92 into master 1 year ago
netrunnereve netrunnereve deleted the vulkan branch 1 year ago
slaren
neilmehta24
jeffliu517
jeffliu517
netrunnereve

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone