llama.cpp
4f0154b0 - llama : support requantizing models instead of only allowing quantization from 16/32bit (#1691)

Commit
2 years ago
llama : support requantizing models instead of only allowing quantization from 16/32bit (#1691) * Add support for quantizing already quantized models * Threaded dequantizing and f16 to f32 conversion * Clean up thread blocks with spares calculation a bit * Use std::runtime_error exceptions.
Author
Parents
Loading