transformers
7a0d582a - [GGUF] Reduce peak RAM usage by casting dequantized tensors early during load (#45386)

Commit

33 days ago

[GGUF] Reduce peak RAM usage by casting dequantized tensors early during load (#45386) * [GGUF] cast dequantized tensors to target dtype during load Signed-off-by: UsamaKenway <usamakenway@gmail.com> * [GGUF] refac dtype, quantization casting Signed-off-by: Usama Kenway <usamakenway@gmail.com> * [GGUF] refac dtype Signed-off-by: Usama Kenway <usamakenway@gmail.com> --------- Signed-off-by: UsamaKenway <usamakenway@gmail.com> Signed-off-by: Usama Kenway <usamakenway@gmail.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

References

#45386 - [GGUF] Reduce peak RAM usage by casting dequantized tensors early during load

Author

UsamaKenway

Parents

ce77bc37

transformers 7a0d582a - [GGUF] Reduce peak RAM usage by casting dequantized tensors early during load (#45386)

transformers
7a0d582a - [GGUF] Reduce peak RAM usage by casting dequantized tensors early during load (#45386)