Add GPTQ Quantization (#1216)
* v1 test draft
* code runs but outputs gibberish.
* draft v1.1
* remove duplicate
* remove dep to transformers and cleaning
* Add serialization and loading
* Clean code and doc
* add flexibility
* remove triton
* remove some dep with transformers
* add testing
* make style
* add accelerate flag
* handle device placement
* make style
* Apply suggestions
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com>
* add doc in data.py
* apply suggestion for utils file
* remove multiple output
* fix Optional
* Apply suggestions from code review
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com>
* remove useless check
* fix doc and style
* fix name
* replace catcher by prefoward hook
* update doctstring for true_sequential
* apply suggestion
* Fix import
* Add docstring for tests
* move args
* fix typo
* fix cpu offload and tokenizer
* fix typo
* fix offload cpu
* modify attribute
* more explicit error
* dataset optional
* add tqdm bar instead
* style
* add doc
* replace by tqdm.auto
* change model
* add CI
* Apply suggestions from code review
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Update .github/workflows/test_gptq.yml
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* add peft compatibility
* Apply suggestions from code review doc
Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com>
* merge examples
* code review
* fix test
* make style
* change var
* fix doc
* add exllama
* change naming
* more doc
---------
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com>