PR #4856 SOTA 2-bit quants - part 2

iq2_xs: basics

Iwan Kawrakow committed 2 years ago

iq2_xs: this should have been in the basics

Iwan Kawrakow committed 2 years ago

iq2_xs: CUDA and scalar CPU works

Iwan Kawrakow committed 2 years ago

iq2_xs: WIP Metal

Iwan Kawrakow committed 2 years ago

iq2_xs: Metal now works

Iwan Kawrakow committed 2 years ago

iq2_xs: working, but dog slow, ARM_NEON dot product

Iwan Kawrakow committed 2 years ago

iq2_xs: better ARM_NEON dot product

Iwan Kawrakow committed 2 years ago

iq2_xs: AVX2 dot product - 19.5 t/s

Iwan Kawrakow committed 2 years ago

iq2_xs: faster AVX2 dit product

Iwan Kawrakow committed 2 years ago

iq2_xs: had forgotten to delete iq2-data.h

Iwan Kawrakow committed 2 years ago

Add llama enum for IQ2_XS

Iwan Kawrakow committed 2 years ago

llama.cpp SOTA 2-bit quants - part 2 #4856 Merged