llama.cpp
SOTA 2-bit quants
#4773
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
17
Changes
View On
GitHub
SOTA 2-bit quants
#4773
ikawrakow
merged 17 commits into
master
from
ik/iq2_2.06bpw
ggerganov
added
high priority
ikawrakow
force pushed
to
a12488bc
1 year ago
JohannesGaessler
commented on 2024-01-04
iq2_xxs: basics
4af24881
iq2_xxs: scalar and AVX2 dot products
7ef63896
iq2_xxs: ARM_NEON dot product
7b72318e
iq2_xxs: WIP Metal
d383f00e
iq2_xxs: Metal dot product now works
dd296101
iq2_xxs: slighty faster dot product
1c96aa0d
iq2_xxs: slighty faster dot product
e211fadc
iq2_xxs: even faster Metal dot product
065cc8cb
iq2_xxs: dequantize CUDA kernel - fix conflict with master
06e6908a
iq2_xxs: quantized CUDA dot product (MMVQ)
82405219
iq2_xxs: slightly faster CUDA dot product
c19d0d09
iq2_xxs: add to llama ftype enum
fd42737c
iq2_xxs: fix MoE on Metal
47ae9b8f
Fix missing MMQ ops when on hipBLAS
61c04053
Fix bug in qequantize_row_iq2_xxs
7db967e8
ikawrakow
force pushed
to
7db967e8
1 year ago
Fixing tests
5684d790
ggerganov
approved these changes on 2024-01-08
JohannesGaessler
approved these changes on 2024-01-08
PR suggestion
bad5f7f3
ikawrakow
merged
dd5ae064
into master
1 year ago
ikawrakow
deleted the ik/iq2_2.06bpw branch
1 year ago
mofosyne
added
Tensor Encoding Scheme
mofosyne
added
Review Complexity : High
Login to write a write a comment.
Login via GitHub
Reviewers
JohannesGaessler
ggerganov
Assignees
No one assigned
Labels
high priority
Review Complexity : High
Tensor Encoding Scheme
Milestone
No milestone
Login to write a write a comment.
Login via GitHub