transformers
Kernels flash attn
#39474
Merged

Kernels flash attn #39474

ArthurZucker merged 46 commits into main from kernels-flash-attn
ArthurZucker
ArthurZucker use partial to wrap around `transformers` utils!
cd4c7cb5
ArthurZucker try to refactor?
005f4821
ArthurZucker revert one wrong change
1b834a4d
ArthurZucker just a nit
d93f366e
HuggingFaceDocBuilderDev
ArthurZucker push
2b7d411d
ArthurZucker reverter watever was wrong!
affba20d
ArthurZucker some nits
1959eb28
ArthurZucker fixes when there is no attention mask
888cd402
kadirnar
ArthurZucker Merge branch 'main' of github.com:huggingface/transformers into kerne…
8f5e62b9
ArthurZucker
ArthurZucker bring the licence back
5a7ae113
ArthurZucker some fixes
c57673bb
ArthurZucker nit
7d69d834
ArthurZucker Merge branch 'kernels-flash-attn' of github.com:huggingface/transform…
7e94910d
ArthurZucker style
112e2a64
ArthurZucker remove prints
501aa7ea
kadirnar
ArthurZucker
ArthurZucker correct dtype
04088bec
ArthurZucker
github-actions
kadirnar
ArthurZucker
ArthurZucker ArthurZucker added Flash Attention
vasqu fa flags for testing
b1e104b0
ArthurZucker update
7087e7b8
ArthurZucker Merge branch 'main' into kernels-flash-attn
cc58aca3
ArthurZucker use paged attention if requested!
6a2996a0
ArthurZucker Merge branch 'kernels-flash-attn' of github.com:huggingface/transform…
8ddc5252
ArthurZucker updates
a5862941
ArthurZucker a clone was needed, not sure why
57842f56
ArthurZucker automatically create cu seq lens when input is flash, this at least m…
43b7f322
kadirnar
ArthurZucker simplify and improve?
12bad1b3
ArthurZucker
ArthurZucker flash attention is kinda broken on recent cuda version so allow the o…
c0b600a5
ArthurZucker ArthurZucker force pushed from d35eb475 to c0b600a5 205 days ago
ArthurZucker Merge branch 'main' into kernels-flash-attn
5c648746
kadirnar
kadirnar commented on 2025-07-21
ArthurZucker fix!
11e50001
ArthurZucker protect kernels import
1c073509
ArthurZucker
MekkCyber
MekkCyber approved these changes on 2025-07-21
Cyrilvallez
Cyrilvallez commented on 2025-07-22
ArthurZucker update
cdaa1eb6
ArthurZucker properly parse generation config being passed
767d5852
ArthurZucker Merge branch 'kernels-flash-attn' of github.com:huggingface/transform…
10f866e1
ArthurZucker revert and update
c75c5398
ArthurZucker add two tests
a2f3126e
ArthurZucker Merge branch 'main' of github.com:huggingface/transformers into kerne…
63b01c31
ArthurZucker some fixes
85829d7d
ArthurZucker fix test FA2
56981a55
ArthurZucker
ArthurZucker takes comment into account
b3f7a49e
ArthurZucker fixup
21e07f77
ArthurZucker revert changes
a8b7ec65
ArthurZucker revert the clone, it is only needed because the metal kernel is not d…
f111d338
zucchini-nlp [docs] update attention implementation and cache docs (#39547)
cd98c1fe
ArthurZucker fix mps on our side for now
f457a085
ArthurZucker
ArthurZucker commented on 2025-07-22
ArthurZucker Update src/transformers/integrations/flash_paged.py
38d241b4
ArthurZucker Merge branches 'main' and 'kernels-flash-attn' of github.com:huggingf…
cb58187b
ArthurZucker no qa
c0f4f099
ArthurZucker ArthurZucker enabled auto-merge (squash) 204 days ago
disabled auto-merge 204 days ago
Manually disabled by user
ArthurZucker ArthurZucker merged efceeaf2 into main 204 days ago
ArthurZucker ArthurZucker deleted the kernels-flash-attn branch 204 days ago
kadirnar
ArthurZucker

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone