transformers
1dcb022e - chore(pixtral): emit block attention mask when using flash attention (#38741)

Commit
237 days ago
chore(pixtral): emit block attention mask when using flash attention (#38741) * chore(pixtral): emit block attention mask when using flash attention Since flash_attention_2 relies solely on position_ids, emitting the block attention mask avoids unnecessary memory usage and prevents OOM on large inputs. * remove unnecessary attention_mask assignment
Author
Parents
Loading