transformers
1dcb022e - chore(pixtral): emit block attention mask when using flash attention (#38741)

Commit

237 days ago

chore(pixtral): emit block attention mask when using flash attention (#38741) * chore(pixtral): emit block attention mask when using flash attention Since flash_attention_2 relies solely on position_ids, emitting the block attention mask avoids unnecessary memory usage and prevents OOM on large inputs. * remove unnecessary attention_mask assignment

References

#38741 - chore(pixtral): emit block attention mask when using flash attention

Author

starcatmeow

Parents

60d4b35b

transformers 1dcb022e - chore(pixtral): emit block attention mask when using flash attention (#38741)

transformers
1dcb022e - chore(pixtral): emit block attention mask when using flash attention (#38741)