transformers
2ad152f8 - In Llama4 fix wrongly inverted causal attention mask when using SDPA implementation (#38094)

Commit
239 days ago
In Llama4 fix wrongly inverted causal attention mask when using SDPA implementation (#38094) When preparing the causal attention mask at this point the mask comes in as a float tensor with min value as a masked value. It is not correct to convert it to bool and treat it as a bool mask as this inverts the mask. `torch.nn.functional.scaled_dot_product_attention` expects that a masked value is `False`. I suspect that the `sdpa` implementation variant may not have been thoroughly tested and that is why this error was not caught earlier. Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Author
Parents
Loading