Fix attention sink implementation in flex attention (#41083)
* Fix attention sink implementation in flex attention
* fix dim
* fix
* Remove print
* raisae error when return_lse is False yet s_aux is providewd
* Clean test files for merge
* Update src/transformers/integrations/flex_attention.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* force return lse
* Add to doc
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>