transformers
22e6f145 - Reducing memory usage: removing useless logits computation in generate() (#31292)

Commit

1 year ago

Reducing memory usage: removing useless logits computation in generate() (#31292) * Add .float() in all generation methods logit outputs * Switch float-casting of logits to training only for main models * Add `num_logits_to_keep` in Llama and add it by default in generate * Apply style * Add num_logits_to_keep as arg in prepare_input_for_generation * Add support for Mistral * Revert models except llama and mistral * Fix default None value in _supports_num_logits_to_keep() * Fix dimension of dummy input * Add exception for prophetnet in _supports_num_logits_to_keep() * Update _supports_num_logits_to_keep() to use inspect.signature() * Add deprecation cycle + remove modification with pretraining_tp * Apply style * Add most used models * Apply style * Make `num_logits_to_keep` an int in all cases to remove if-else clause * Add compile check for the warning * Fix torch versions * style * Add gemma2 * Update warning version * Add comment about .float operations in generation utils * Add tests in GenerationTesterMixin and ModelTesterMixin * Fix batch size for assisted decoding in tests * fix small issues in test * refacor test * fix slicing removing dim issue * Add nemotron support (should fix check-copy issue in CIs) * Trigger new CIs * Trigger new CIs * Bump version * Bump version in TODO * Trigger CIs * remove blank space * Trigger CIs

References

#31292 - Reducing memory usage: removing useless logits computation in generate()

Author

Cyrilvallez

Parents

d806fa3e

transformers 22e6f145 - Reducing memory usage: removing useless logits computation in generate() (#31292)

transformers
22e6f145 - Reducing memory usage: removing useless logits computation in generate() (#31292)