[`Attn Masks`] Non-vmap default for attention masks (#41852)
* atmpt 1
* fixup masking to work correctly with old torch
* few changes to make things a bit more cleaner
* oopsie
* fix integer overflow on bidirectional masks via indexing fn
* rm executorch workarounds --> still need to handle on sliding etc fns properly
* typo
* docs, fix older torch inplace issue, proper kwarg handling
* chunked works with non vmap and older torch, add warning on non guaranteed masks
* lift unnecessary restriction on older torch
* simplify a few things, restrict torch < 2.6 to non-vmap (for now)
* try fix
* remove unnecessary slicing logic
* remove legacy func
* harmonize slightly
---------
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>