PR #5453 1.5 bit quantization

iq1_s: WIP basics

Iwan Kawrakow committed 1 year ago

iq1_s: CUDA is working

Iwan Kawrakow committed 1 year ago

iq1_s: scalar CPU dot product

Iwan Kawrakow committed 1 year ago

iq1_s: WIP AVX2 dot product - something is not right

Iwan Kawrakow committed 1 year ago

Fix tests

Iwan Kawrakow committed 1 year ago

Fix shadow warnings

Iwan Kawrakow committed 1 year ago

Fix after merge with latest master

Iwan Kawrakow committed 1 year ago

iq1_s: AVX2 finally works

Iwan Kawrakow committed 1 year ago

iq1_s: ARM_NEON dot product. Works, but not very fast

Iwan Kawrakow committed 1 year ago

iq1_s: better grid

Iwan Kawrakow committed 1 year ago

iq1_s: use IQ2_XXS for attn_output

Iwan Kawrakow committed 1 year ago

iq1_s: Metal basics

Iwan Kawrakow committed 1 year ago

iq1_s: Metal works, but quite slow

Iwan Kawrakow committed 1 year ago

iq1_s: Tests

Iwan Kawrakow committed 1 year ago

iq1_s: slightly faster dot product

Iwan Kawrakow committed 1 year ago

llama.cpp 1.5 bit quantization #5453 Merged