transformers
7a51cbc6 - Dynamic number of speculative tokens in order to accelerate speculative decoding (#33258)

Commit

1 year ago

Dynamic number of speculative tokens in order to accelerate speculative decoding (#33258) * optimal Speculation Lookahead based on probability * update peer finished condition * add support to do_sample True * add stopping criteria * gitignore * add print * remove prints * minor * minor * git ignore * adding test to stopping ConfidenceCriteria * doc + format * add doc * Update .gitignore * update docstring and default value of assistant_confidence_threshold * add docstring * Update src/transformers/generation/configuration_utils.py implicit default value (None) Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * style fix --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

References

#33258 - Dynamic number of speculative tokens in order to accelerate speculative decoding

Author

jmamou

Parents

42babe85

transformers 7a51cbc6 - Dynamic number of speculative tokens in order to accelerate speculative decoding (#33258)

transformers
7a51cbc6 - Dynamic number of speculative tokens in order to accelerate speculative decoding (#33258)