In Llama4 fix wrongly inverted causal attention mask when using SDPA implementation (#38094)
When preparing the causal attention mask at this point the mask comes
in as a float tensor with min value as a masked value.
It is not correct to convert it to bool and treat it as a bool mask as
this inverts the mask.
`torch.nn.functional.scaled_dot_product_attention` expects that a masked value is `False`.
I suspect that the `sdpa` implementation variant may not have been
thoroughly tested and that is why this error was not caught earlier.
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>