transformers
be3fd8a2 - [Flash Attention 2] Add flash attention 2 for GPT-J (#28295)

Commit

1 year ago

[Flash Attention 2] Add flash attention 2 for GPT-J (#28295) * initial implementation of flash attention for gptj * modify flash attention and overwrite test_flash_attn_2_generate_padding_right * update flash attention support list * remove the copy line in the `CodeGenBlock` * address copy mechanism * Update src/transformers/models/gptj/modeling_gptj.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Add GPTJ attention classes * add expected outputs in the gptj test * Ensure repo consistency with 'make fix-copies' --------- Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

References

#28295 - [Flash Attention 2] Add flash attention 2 for GPT-J

#29969 - [SigLIP] Add fast tokenizer

#32831 - [Docs] Update resources

#33111 - [Backbone] Remove out_features everywhere

#33174 - [Zero-shot image classification pipeline] Remove tokenizer_kwargs

#39821 - Support MetaCLIP 2

#58 - Add EoMT DINOv3 model

#59 - Fix attention mask handling in EoMT-DINOv3 converter

#41212 - Add EoMT with DINOv3 backbone

#62 - Add initial DEIMv2 model implementation

Author

bytebarde

Parents

d522afea

transformers be3fd8a2 - [Flash Attention 2] Add flash attention 2 for GPT-J (#28295)

transformers
be3fd8a2 - [Flash Attention 2] Add flash attention 2 for GPT-J (#28295)