I-BERT model support (#10153)
* IBertConfig, IBertTokentizer added
* IBert Model names moified
* tokenizer bugfix
* embedding -> QuantEmbedding
* quant utils added
* quant_mode added to configuration
* QuantAct added, Embedding layer + QuantAct addition
* QuantAct added
* unused path removed, QKV quantized
* self attention layer all quantized, except softmax
* temporarl commit
* all liner layers quantized
* quant_utils bugfix
* bugfix: requantization missing
* IntGELU added
* IntSoftmax added
* LayerNorm implemented
* LayerNorm implemented all
* names changed: roberta->ibert
* config not inherit from ROberta
* No support for CausalLM
* static quantization added, quantize_model.py removed
* import modules uncommented
* copyrights fixed
* minor bugfix
* quant_modules, quant_utils merged as one file
* import * fixed
* unused runfile removed
* make style run
* configutration.py docstring fixed
* refactoring: comments removed, function name fixed
* unused dependency removed
* typo fixed
* comments(Copied from), assertion string added
* refactoring: super(..) -> super(), etc.
* refactoring
* refarctoring
* make style
* refactoring
* cuda -> to(x.device)
* weight initialization removed
* QuantLinear set_param removed
* QuantEmbedding set_param removed
* IntLayerNorm set_param removed
* assert string added
* assertion error message fixed
* is_decoder removed
* enc-dec arguments/functions removed
* Converter removed
* quant_modules docstring fixed
* conver_slow_tokenizer rolled back
* quant_utils docstring fixed
* unused aruments e.g. use_cache removed from config
* weight initialization condition fixed
* x_min, x_max initialized with small values to avoid div-zero exceptions
* testing code for ibert
* test emb, linear, gelu, softmax added
* test ln and act added
* style reformatted
* force_dequant added
* error tests overrided
* make style
* Style + Docs
* force dequant tests added
* Fix fast tokenizer in init
* Fix doc
* Remove space
* docstring, IBertConfig, chunk_size
* test_modeling_ibert refactoring
* quant_modules.py refactoring
* e2e integration test added
* tokenizers removed
* IBertConfig added to tokenizer_auto.py
* bugfix
* fix docs & test
* fix style num 2
* final fixes
Co-authored-by: Sehoon Kim <sehoonkim@berkeley.edu>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>