llama.cpp
SOTA 3-bit quants
#5196
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
14
Changes
View On
GitHub
SOTA 3-bit quants
#5196
ikawrakow
merged 14 commits into
master
from
ik/iq3_xxs
iq3_xxs: quantize/dequantize
8524d277
iq3_xxs: CUDA dequantize works
bf9349c6
iq2_xxs: tuning quantization
90faca24
iq3_xxs: starting to look better
f1206729
iq3_xxs: CUDA dot product
f1875b0a
iq3_xxs: scalar and AVX2 dot products
c3b20296
iq3_xxs: ARM_NEON and Metal
15493023
iq3_xxs: slightly better grid points
51cde193
Faster iq3_xxs and iq2_xs dot products on CUDA
68cfcd47
iq3_xxs: add some quant mix
7e4e7488
iq3_xxs: fix failing quantization test
6efbc690
iq3_xxs: hopefully fix ROCm
62623434
iq3_xxs: failing tests
fe2160ee
ggerganov
approved these changes on 2024-01-30
Add IQ3_XXS to test-backend-ops
fb6576bc
ikawrakow
merged
f4d7e549
into master
1 year ago
ikawrakow
deleted the ik/iq3_xxs branch
1 year ago
mofosyne
added
Review Complexity : High
mofosyne
added
Tensor Encoding Scheme
Login to write a write a comment.
Login via GitHub
Reviewers
ggerganov
Assignees
No one assigned
Labels
Review Complexity : High
Tensor Encoding Scheme
Milestone
No milestone
Login to write a write a comment.
Login via GitHub