Dynamic number of speculative tokens in order to accelerate speculative decoding (#33258)
* optimal Speculation Lookahead based on probability
* update peer finished condition
* add support to do_sample True
* add stopping criteria
* gitignore
* add print
* remove prints
* minor
* minor
* git ignore
* adding test to stopping ConfidenceCriteria
* doc + format
* add doc
* Update .gitignore
* update docstring and default value of assistant_confidence_threshold
* add docstring
* Update src/transformers/generation/configuration_utils.py
implicit default value (None)
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* style fix
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>