transformers
b890df15 - Warn about forgetting attention mask functions (#45811)

Commit

1 day ago

Warn about forgetting attention mask functions (#45811) * Warn about forgetting attention mask functions As it is implemented currently, when registering a custom attention function, but not a custom attention mask function, the mask is set to `None`. This can cause unexpected issues, as it means that e.g., causal masking is silently discarded when you only try to change a minor detail of the attention implementation of an existing model, with grave changes to the overall performance (the model starts to cheat). This happened to me in one of my recent projects. To hopefully save future readers of the docs that problem, this PR adds: 1. An explicit warning that a mask function should likely be registered 2. Copyable code to register the sdpa mask as a default (so that if you just copy/paste code from the docs, you shouldn't fall victim to this issue). * Apply reformulation suggestion Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Make wording more accurate (and somewhat stronger) Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * apply proper wording --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

References

#45811 - Warn about forgetting attention mask functions

Author

TuringTux

Parents

6c66de3f

transformers b890df15 - Warn about forgetting attention mask functions (#45811)

transformers
b890df15 - Warn about forgetting attention mask functions (#45811)