Add support for loading GPTQ models on CPU (#26719)
* Add support for loading GPTQ models on CPU
Right now, we can only load the GPTQ Quantized model on the CUDA
device. The attribute `gptq_supports_cpu` checks if the current
auto_gptq version is the one which has the cpu support for the
model or not.
The larger variants of the model are hard to load/run/trace on
the GPU and that's the rationale behind adding this attribute.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
* Update quantization.md
* Update quantization.md
* Update quantization.md