PR #11081 vulkan: scale caching for k quants + misc fixes

vulkan: scale caching for k quants + misc fixes #11081

netrunnereve merged 25 commits into ggml-org:master from vulkan

q6_k scale caching

d122d5c9

16 bit unpack

6b06d168

q4_k test (slow)

21c6b805

revert it

b0e4ccbe

q3_k

07d0d58b

q2_k

d70a7316

github-actions added Vulkan

github-actions added ggml

netrunnereve requested a review from

0cc4m 1 year ago

github-actions added script

github-actions added python

github-actions added Apple Metal

little stuff

c01ccf82

netrunnereve removed script

netrunnereve removed python

netrunnereve removed Apple Metal

jeffbolznv requested a review from

jeffbolznv 1 year ago

0cc4m commented on 2025-01-05

jeffbolznv requested changes on 2025-01-05

netrunnereve requested a review from

JohannesGaessler 1 year ago

netrunnereve requested a review from

ngxson 1 year ago

github-actions added script

github-actions added testing

github-actions added Nvidia GPU

github-actions added examples

github-actions added python

github-actions added devops

github-actions added server

github-actions added SYCL

github-actions added Apple Metal

try precalculating products of a and q2_k scales

bdd98c74

Revert "try precalculating products of a and q2_k scales"

17307718

unpack should be u16, add vim swap to gitignore (about time)

b4ae7005

better q4_k scales

cdf70cf2

q5_k

6f5d62b0

better q6_k with separate paths for all threads and partial threads i…

91f1d9ce

q2_k better dequant

cc28742c

q3_k optimizations

fe71a8c4

q3_k use hmask simd from cpu avx version

923e9a83

netrunnereve removed script

netrunnereve removed testing

netrunnereve removed Nvidia GPU

netrunnereve removed examples

netrunnereve removed python

netrunnereve removed devops

netrunnereve removed server

netrunnereve removed SYCL

netrunnereve removed Apple Metal

netrunnereve removed review request from

ngxson 1 year ago

netrunnereve removed review request from

JohannesGaessler 1 year ago

Merge https://github.com/ggerganov/llama.cpp into vulkan

c9463641

make the caches happy

51b5ac50

q3_k separate out calculation

973bc406

q2_k separate out

6145fc79

little stuff

845d572b

jeffbolznv commented on 2025-01-10

merge master

d63497b3

use calc_superblock everywhere

30eacad2

q2_k optimize scale calculation

ed1ad94c

jeffbolznv requested changes on 2025-01-12

more barriers

4ae3fc01

jeffbolznv approved these changes on 2025-01-13

netrunnereve merged adc5dd92 into master 1 year ago

netrunnereve deleted the vulkan branch 1 year ago

Reviewers

jeffbolznv

0cc4m

Assignees

No one assigned

Labels

Vulkan ggml

Milestone

No milestone

llama.cpp vulkan: scale caching for k quants + misc fixes #11081 Merged

vulkan: scale caching for k quants + misc fixes #11081

llama.cpp
vulkan: scale caching for k quants + misc fixes
#11081

Merged