transformers
92abe603 - >3-5x faster torch.compile forward compilation for autoregressive decoder models (#32227)

Commit

1 year ago

>3-5x faster torch.compile forward compilation for autoregressive decoder models (#32227) * draft * apply changes to all relevant archs * rerun ci - check_docstrings.py failing? * fix docstring * move 2D->4D mask creation to modeling file * repo consistency * fix the batch size = 1 case - calling contiguous is not enough * nit * style * propagate to gemma/gemma-2 * prepare inputs for gemma generation * implement test and tiny fix in gemma2 * Update src/transformers/models/bloom/modeling_bloom.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix copies * ci pass * fix gemma's test_compile_static_cache tests * flacky * retrigger ci --------- Co-authored-by: sanchit-gandhi <sanchit@huggingface.co> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

References

#32227 - >3-5x faster torch.compile forward compilation for autoregressive decoder models

Author

fxmarty

Parents

b46bd8b9

transformers 92abe603 - >3-5x faster torch.compile forward compilation for autoregressive decoder models (#32227)

transformers
92abe603 - >3-5x faster torch.compile forward compilation for autoregressive decoder models (#32227)