PR #36103 Proper performant flex attention implementation

Proper performant flex attention implementation #36103

bursteratom wants to merge 22 commits into huggingface:main from bursteratom:proper_flex

vasqu commented on 2025-02-09

ArthurZucker commented on 2025-02-10

molbap commented on 2025-02-10

bursteratom force pushed from 516de450 to ae9a2b02 205 days ago

bursteratom requested a review from

molbap 205 days ago

bursteratom requested a review from

vasqu 205 days ago

bursteratom requested a review from

ArthurZucker 205 days ago

bursteratom force pushed from 4c525764 to 7519b3cc 204 days ago

bursteratom force pushed from 7519b3cc to 8c28c9db 204 days ago

vasqu approved these changes on 2025-02-12

bursteratom force pushed from 8e50735f to e705da7b 203 days ago

bursteratom force pushed from 432bafaa to 74833140 203 days ago

ArthurZucker commented on 2025-02-14

bursteratom force pushed from 5a0cb2dd to 1dea5a8f 199 days ago

bursteratom force pushed from fb9c4c63 to 3d9377fb 195 days ago

bursteratom force pushed from 0bba7283 to 0c200a05 191 days ago

bursteratom force pushed from 2406dabc to ff2a4556 191 days ago

bursteratom force pushed from 99e62c0a to c50468c1 191 days ago

bursteratom force pushed from 8aaeda8c to 864efb28 187 days ago

ArthurZucker approved these changes on 2025-03-01

shethaadit approved these changes on 2025-03-03

proper performant flex attention implementation

800a7e70

wrapper for flex attention to compile only when triggered

c331bb3f

wrapper for flex attention to compile only when triggered

e1438ade

attention mask type detection

68bd4e6a

Update src/transformers/integrations/flex_attention.py

cf0ad129

nit

2afa102d

nit

a78f7bc1

nit

9d1ee83d

nit

c2691909

gemma2 support

4e58c63f

add citation for torchtune

6237ae4a

Update src/transformers/models/llama/modeling_llama.py

eb254a8d

Update flex_attention.py

f593d3a8

nit

6cf7ea91

nit

4da79476

nit

743ab13c

reset gemma2 modifications

5ab35829

nit

ad63890e

nit

f3b3bae7

nit

43871f7e

licencing

f0945806

bursteratom force pushed from 864efb28 to f0945806 184 days ago

ArthurZucker added flex attention

ArthurZucker added Compilation

Merge branch 'main' into proper_flex

b81e90f1

bursteratom closed this 177 days ago

bursteratom reopened this 177 days ago

github-actions marked this pull request as draft 177 days ago

bursteratom closed this 177 days ago

Reviewers

ArthurZucker

vasqu

shethaadit

molbap

Assignees

No one assigned

Labels

Compilation flex attention

Milestone

No milestone

transformers Proper performant flex attention implementation #36103 Closed

Proper performant flex attention implementation #36103

transformers
Proper performant flex attention implementation
#36103

Closed