Add fast path for bidirectional mask creation to fix regression (#41586)
* fixed performance regression
* also fixed the older_torch function
* Update src/transformers/masking_utils.py
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
* fix
* more general
* fix slicing
* fix data dependent
---------
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>