PR #5196 SOTA 3-bit quants

SOTA 3-bit quants #5196

ikawrakow merged 14 commits into master from ik/iq3_xxs

iq3_xxs: quantize/dequantize

8524d277

iq3_xxs: CUDA dequantize works

bf9349c6

iq2_xxs: tuning quantization

90faca24

iq3_xxs: starting to look better

f1206729

iq3_xxs: CUDA dot product

f1875b0a

iq3_xxs: scalar and AVX2 dot products

c3b20296

iq3_xxs: ARM_NEON and Metal

15493023

iq3_xxs: slightly better grid points

51cde193

Faster iq3_xxs and iq2_xs dot products on CUDA

68cfcd47

iq3_xxs: add some quant mix

7e4e7488

iq3_xxs: fix failing quantization test

6efbc690

iq3_xxs: hopefully fix ROCm

62623434

iq3_xxs: failing tests

fe2160ee

ggerganov approved these changes on 2024-01-30

Add IQ3_XXS to test-backend-ops

fb6576bc

ikawrakow merged f4d7e549 into master 2 years ago

ikawrakow deleted the ik/iq3_xxs branch 2 years ago

mofosyne added Review Complexity : High

mofosyne added Tensor Encoding Scheme

Reviewers

ggerganov

Assignees

No one assigned

Labels

Review Complexity : High Tensor Encoding Scheme

Milestone

No milestone