transformers
03538a80 - [`Attn Masks`] Non-vmap default for attention masks (#41852)

Commit

76 days ago

[`Attn Masks`] Non-vmap default for attention masks (#41852) * atmpt 1 * fixup masking to work correctly with old torch * few changes to make things a bit more cleaner * oopsie * fix integer overflow on bidirectional masks via indexing fn * rm executorch workarounds --> still need to handle on sliding etc fns properly * typo * docs, fix older torch inplace issue, proper kwarg handling * chunked works with non vmap and older torch, add warning on non guaranteed masks * lift unnecessary restriction on older torch * simplify a few things, restrict torch < 2.6 to non-vmap (for now) * try fix * remove unnecessary slicing logic * remove legacy func * harmonize slightly --------- Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

References

#41852 - [`Attn Masks`] Non-vmap default for attention masks

Author

vasqu

Parents

700c48a2

transformers 03538a80 - [`Attn Masks`] Non-vmap default for attention masks (#41852)

transformers
03538a80 - [`Attn Masks`] Non-vmap default for attention masks (#41852)