llama.cpp
4f0154b0 - llama : support requantizing models instead of only allowing quantization from 16/32bit (#1691)

Commit

2 years ago

llama : support requantizing models instead of only allowing quantization from 16/32bit (#1691) * Add support for quantizing already quantized models * Threaded dequantizing and f16 to f32 conversion * Clean up thread blocks with spares calculation a bit * Use std::runtime_error exceptions.

References

#1691 - Support requantizing models instead of only allowing quantization from 16/32bit

Author

KerfuffleV2

Parents

ef3171d1

llama.cpp 4f0154b0 - llama : support requantizing models instead of only allowing quantization from 16/32bit (#1691)

llama.cpp
4f0154b0 - llama : support requantizing models instead of only allowing quantization from 16/32bit (#1691)