transformers
2ad152f8 - In Llama4 fix wrongly inverted causal attention mask when using SDPA implementation (#38094)

Commit

282 days ago

In Llama4 fix wrongly inverted causal attention mask when using SDPA implementation (#38094) When preparing the causal attention mask at this point the mask comes in as a float tensor with min value as a masked value. It is not correct to convert it to bool and treat it as a bool mask as this inverts the mask. `torch.nn.functional.scaled_dot_product_attention` expects that a masked value is `False`. I suspect that the `sdpa` implementation variant may not have been thoroughly tested and that is why this error was not caught earlier. Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

References

#38094 - In Llama4 fix wrongly inverted causal attention mask when using SDPA implementation

Author

sogartar

Parents

de70c842

transformers 2ad152f8 - In Llama4 fix wrongly inverted causal attention mask when using SDPA implementation (#38094)

transformers
2ad152f8 - In Llama4 fix wrongly inverted causal attention mask when using SDPA implementation (#38094)