[GGUF] Reduce peak RAM usage by casting dequantized tensors early during load (#45386)
* [GGUF] cast dequantized tensors to target dtype during load
Signed-off-by: UsamaKenway <usamakenway@gmail.com>
* [GGUF] refac dtype, quantization casting
Signed-off-by: Usama Kenway <usamakenway@gmail.com>
* [GGUF] refac dtype
Signed-off-by: Usama Kenway <usamakenway@gmail.com>
---------
Signed-off-by: UsamaKenway <usamakenway@gmail.com>
Signed-off-by: Usama Kenway <usamakenway@gmail.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>