Add GPTQ Quantization (#1216)

Commit

2 years ago

Add GPTQ Quantization (#1216) * v1 test draft * code runs but outputs gibberish. * draft v1.1 * remove duplicate * remove dep to transformers and cleaning * Add serialization and loading * Clean code and doc * add flexibility * remove triton * remove some dep with transformers * add testing * make style * add accelerate flag * handle device placement * make style * Apply suggestions Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com> * add doc in data.py * apply suggestion for utils file * remove multiple output * fix Optional * Apply suggestions from code review Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com> * remove useless check * fix doc and style * fix name * replace catcher by prefoward hook * update doctstring for true_sequential * apply suggestion * Fix import * Add docstring for tests * move args * fix typo * fix cpu offload and tokenizer * fix typo * fix offload cpu * modify attribute * more explicit error * dataset optional * add tqdm bar instead * style * add doc * replace by tqdm.auto * change model * add CI * Apply suggestions from code review Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update .github/workflows/test_gptq.yml Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * add peft compatibility * Apply suggestions from code review doc Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com> * merge examples * code review * fix test * make style * change var * fix doc * add exllama * change naming * more doc --------- Co-authored-by: younesbelkada <younesbelkada@gmail.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com>

References

#1216 - Add GPTQ Quantization

Author

SunMarc

Parents

94bf7669

optimum 9f2943eb - Add GPTQ Quantization (#1216)

optimum
9f2943eb - Add GPTQ Quantization (#1216)