llama.cpp
SOTA 2-bit quants
#4773
Merged

SOTA 2-bit quants #4773

ikawrakow merged 17 commits into master from ik/iq2_2.06bpw
ikawrakow
ggerganov ggerganov added high priority
slaren
ikawrakow ikawrakow force pushed to a12488bc 1 year ago
ikawrakow
JohannesGaessler
JohannesGaessler
JohannesGaessler commented on 2024-01-04
Dampfinchen
ikawrakow
Dampfinchen
ikawrakow
JianbangZ
ikawrakow
sakura-umi
sorasoras
Dampfinchen
ggerganov
ikawrakow
he29-net
ikawrakow
he29-net
iq2_xxs: basics
4af24881
iq2_xxs: scalar and AVX2 dot products
7ef63896
iq2_xxs: ARM_NEON dot product
7b72318e
iq2_xxs: WIP Metal
d383f00e
iq2_xxs: Metal dot product now works
dd296101
iq2_xxs: slighty faster dot product
1c96aa0d
iq2_xxs: slighty faster dot product
e211fadc
iq2_xxs: even faster Metal dot product
065cc8cb
iq2_xxs: dequantize CUDA kernel - fix conflict with master
06e6908a
iq2_xxs: quantized CUDA dot product (MMVQ)
82405219
iq2_xxs: slightly faster CUDA dot product
c19d0d09
iq2_xxs: add to llama ftype enum
fd42737c
iq2_xxs: fix MoE on Metal
47ae9b8f
Fix missing MMQ ops when on hipBLAS
61c04053
Fix bug in qequantize_row_iq2_xxs
7db967e8
ikawrakow ikawrakow force pushed to 7db967e8 1 year ago
Fixing tests
5684d790
JianbangZ
ggerganov
ggerganov approved these changes on 2024-01-08
JohannesGaessler
JohannesGaessler approved these changes on 2024-01-08
PR suggestion
bad5f7f3
ikawrakow ikawrakow merged dd5ae064 into master 1 year ago
ikawrakow ikawrakow deleted the ik/iq2_2.06bpw branch 1 year ago
TheBloke
JianbangZ
TheBloke
Dampfinchen
ikawrakow
Dampfinchen
jxy
ggerganov
x4080
jxy
jxy
Ttl
x4080
x4080
JianbangZ
joseph777111
ikawrakow
JianbangZ
x4080
tsengalb99
ikawrakow
tsengalb99
mofosyne mofosyne added Tensor Encoding Scheme
mofosyne mofosyne added Review Complexity : High
afsara-ben

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone