transformers
4b3eb19f - Fix llama model sdpa attention forward function masking bug when output_attentions=True (#30652)

Commit

1 year ago

Fix llama model sdpa attention forward function masking bug when output_attentions=True (#30652) * Fix llama model forward function with attention=True, same-length encoded sequence. * Fix style * propagate fix to modeling_cohere, gemma, dbrx, and olmo (which copy the same sdpa masking logic from llama) * Fix style * ignore unnecessary sdpa mask converter when output_attentions=True * add tests checking sdpa and eager outputs match when output_attentions=True * Split if statements in two lines Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Fix formatting * Add fix to new jetmoe model * Add missing output_attentions argument to jetmoe mask creation --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

References

#30652 - Fix llama model sdpa attention forward function masking bug when output_attentions=True

Author

Aladoro

Parents

2d83324e

transformers 4b3eb19f - Fix llama model sdpa attention forward function masking bug when output_attentions=True (#30652)

transformers
4b3eb19f - Fix llama model sdpa attention forward function masking bug when output_attentions=True (#30652)