llama.cpp
k-quants with super-block size of 64
#2001
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
35
Changes
View On
GitHub
k-quants with super-block size of 64
#2001
ggerganov
merged 35 commits into
master
from
ik/k_quants_64
k_quants: WIP super-blocks with 64 weights
d2f12ac3
k_quants: WIP super-blocks with 64 weights
9fe2a2b1
k_quants: WIP super-blocks with 64 weights
1f6195c2
k_quants: WIP super-blocks with 64 weights
aebd5471
k_quants: WIP super-blocks with 64 weights
2b2ab31a
k_quants: WIP super-blocks with 64 weights
bcf8c5c3
k_quants: WIP super-blocks with 64 weights
c6c35366
k_quants: WIP super-blocks with 64 weights
5aae4b8d
k_quants: WIP super-blocks with 64 weights
41e46ec1
k_quants: WIP super-blocks with 64 weights
460dd841
k_quants: WIP super-blocks with 64 weights
3bd9ae79
k_quants: WIP super-blocks with 64 weights
03f30c8e
k_quants: WIP super-blocks with 64 weights
cda47a6b
k_quants: WIP super-blocks with 64 weights
80c75fe8
k_quants: WIP super-blocks with 64 weights
2b2a13c4
k_quants: WIP super-blocks with 64 weights
9d27d8d0
k_quants: WIP super-blocks with 64 weights
2ff543c1
k_quants: WIP super-blocks with 64 weights
d92c5a9e
k_quants: WIP super-blocks with 64 weights
fae24afd
k_quants: WIP super-blocks with 64 weights
e1bbcfc5
k_quants: WIP super-blocks with 64 weights
167a0bbe
k_quants: WIP super-blocks with 64 weights
6081a655
k_quants: WIP super-blocks with 64 weights
ff83e32c
k_quants: WIP super-blocks with 64 weights
285eeb15
k_quants: call them _K, not _k, also on Metal
8b98d01e
k_quants: correctly define QK_K in llama.cpp
558a1942
Fixed bug in q4_K quantization added with the 64-block addition
333ffcc5
Simplify via lambda
88412a1a
k_quants: swicth Q3_K to 4-bit scales when QK_K = 64
aeefd4e7
k_quants: switch Q4_K to 4-bit scales when QK_K = 64
ce19b965
k_quants: forgot to add the Metal changes in last commit
4f615069
k_quants: change Q5_K to be type 0 when QK_K = 64
ccf49013
k_quants: AVX2 implementation for new 64-weight Q5_K
2da3a597
k_quants: 10% faster ARM_NEON Q5_K dot product
53e81ca2
k_quants: fixed issue caused by merging with master
5fd83379
ikawrakow
requested a review
from
ggerganov
2 years ago
ggerganov
approved these changes on 2023-06-26
ggerganov
merged
6769e944
into master
2 years ago
ggerganov
deleted the ik/k_quants_64 branch
2 years ago
Login to write a write a comment.
Login via GitHub
Reviewers
ggerganov
Assignees
No one assigned
Labels
None yet
Milestone
No milestone
Login to write a write a comment.
Login via GitHub