transformers
acc968c5 - [CP] Add attention_mask to the buffer when the mask is causal (#40619)

Commit
103 days ago
[CP] Add attention_mask to the buffer when the mask is causal (#40619) Fix attention mask validation for context parallelism Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Author
Parents
Loading