transformers
7a51cbc6 - Dynamic number of speculative tokens in order to accelerate speculative decoding (#33258)

Commit
1 year ago
Dynamic number of speculative tokens in order to accelerate speculative decoding (#33258) * optimal Speculation Lookahead based on probability * update peer finished condition * add support to do_sample True * add stopping criteria * gitignore * add print * remove prints * minor * minor * git ignore * adding test to stopping ConfidenceCriteria * doc + format * add doc * Update .gitignore * update docstring and default value of assistant_confidence_threshold * add docstring * Update src/transformers/generation/configuration_utils.py implicit default value (None) Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * style fix --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Author
Parents
Loading