transformers
7a0d582a - [GGUF] Reduce peak RAM usage by casting dequantized tensors early during load (#45386)

Commit
33 days ago
[GGUF] Reduce peak RAM usage by casting dequantized tensors early during load (#45386) * [GGUF] cast dequantized tensors to target dtype during load Signed-off-by: UsamaKenway <usamakenway@gmail.com> * [GGUF] refac dtype, quantization casting Signed-off-by: Usama Kenway <usamakenway@gmail.com> * [GGUF] refac dtype Signed-off-by: Usama Kenway <usamakenway@gmail.com> --------- Signed-off-by: UsamaKenway <usamakenway@gmail.com> Signed-off-by: Usama Kenway <usamakenway@gmail.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Author
Parents
Loading