Fully deprecate AutoGPTQ for GPT-QModel (#2385)
* call hf_select_quant_linear_v2()
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
* Remove the import of auto_gptq
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
* Remove the import of auto_gptq in unittests
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
* Fixed the incorrect layer_output when using the latest version of transformers.
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
* use assertEqual instead of assertTrue
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
* load_quantized_model() remove "disable_exllama" and "exllama_config" arguments
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
* fix gptq unittest
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
* fix gptq unittest
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
* setting offload_to_disk=False
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
* remove auto_gptq
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
* format
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
* format
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
* fix
* fix
* remove ExllamaV1 Test
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
* update ci
* test
* avoid uv
* re-enable uv and pin to >= 5.6.12
* Update .github/workflows/test_gptq.yml
Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
---------
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: IlyasMoutawwakil <moutawwakil.ilyas.tsi@gmail.com>
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>
Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>