iq2_xxs: tune quantization (#5320)

Commit

2 years ago

iq2_xxs: tune quantization (#5320) We get slightly better PPL, and we cut quantization time in nearly half. The trick is to 1st quantize without forcing points onto the E8-lattice. We can then use a narrower search range around the block scale that we got that way. Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>