transformers
πŸ”΄πŸ”΄πŸ”΄ [`Attention`] Refactor Attention Interface for Bart-based Models
#38108
Merged

πŸ”΄πŸ”΄πŸ”΄ [`Attention`] Refactor Attention Interface for Bart-based Models #38108

vasqu merged 71 commits into main from vas-enc-dec-attn-refactor
vasqu
vasqu starting attn refactor for encoder decoder models via bart (eager + s…
5e7df835
vasqu flash attention works, remove unnecessary code
faf79141
vasqu flex attention support for bart!, gotta check if the renaming is not …
90a90d3f
vasqu some comments
259258d4
vasqu skip flex grad test for standalone as done with the other test
afb29345
vasqu revert flex attn rename (for now), sdpa simplify, and todos
25db34af
vasqu more todos
131de1b0
vasqu refactor mask creation for reuse
c8b8ed6c
vasqu modular attempt at biogpt
c0d83b64
github-actions github-actions marked this pull request as draft 221 days ago
github-actions
HuggingFaceDocBuilderDev
vasqu first batch of other models
59cf07d9
vasqu fix attn dropout
146b02b6
vasqu fix autoformer copies
b7f0a2bd
vasqu hubert
00c27dfb
vasqu another batch of models
fc41dc20
vasqu copies/style + last round of bart models --> whisper next?
1e2b4f02
vasqu remove unnecessary _reshape function and remove copy to whisper
dccabeb4
vasqu add skip for decoder-only models out of enc-dec (same as in bart)
cecd0a41
vasqu bring back licences
ac61dd79
vasqu remove comment, added to pr read instead
a6e848d1
vasqu vasqu changed the title [`Attention`] Refactor Attention Interface and Enable Flex Attention [`Attention`] Refactor Attention Interface for Bart-based Modelsand Enable Flex Attention 220 days ago
vasqu vasqu changed the title [`Attention`] Refactor Attention Interface for Bart-based Modelsand Enable Flex Attention [`Attention`] Refactor Attention Interface for Bart-based Models and Enable Flex Attention 220 days ago
vasqu mostly docs
ddfc515e
vasqu disable sew flex attn as it's unclear attn mask for now
3e5da386
vasqu oops
9a9b1404
vasqu test fixes for enc-dec
aecd5e2b
vasqu torch fx fixes + try at flex attn
7bdb6920
vasqu skip on mbart
f8260e67
vasqu some more fixes
598a5669
vasqu musicgen skip / delete old attn class logic + sdpa compose compile skip
61b648f1
vasqu disable flex attn for musicgen, not worth the effort
43169910
vasqu more fixes and style
69371062
vasqu flex attention test for dropout and encoder decoder that dont have ma…
4f123474
vasqu informer fixes
05e38b12
vasqu the weirdest thing I've encountered yet...
9a8d4e47
vasqu style
2055759f
vasqu remove empty tensor attempt, found core root in previous commits
adc808d2
vasqu disable time series due to tests being very text centric on inputs
3d23455a
vasqu add speech to text to be ignoring the other attns, also due to tests
8f9de868
vasqu update docs
b94c9664
vasqu remaining issues resolved ?
6f813cd1
vasqu update docs for current state --> nllb moe and pegasus x sdpa is ques…
3be2a9d4
vasqu vasqu marked this pull request as ready for review 218 days ago
github-actions github-actions requested a review from ArthurZucker ArthurZucker 218 days ago
github-actions github-actions requested a review from eustlb eustlb 218 days ago
vasqu some models have not set the is_causal flag...
6dbd77ad
ArthurZucker
ArthurZucker commented on 2025-05-16
vasqu change dtype in softmax tol old behaviour + some modular fixes
dd3d3077
vasqu
vasqu
vasqu I hate it but it is what it is
d77ea86f
vasqu fixes from main for bart
71f7f1b2
vasqu forgot this one
8a43566d
vasqu some model fixes
270c42ab
vasqu style
cc6cae0d
vasqu
ArthurZucker
vasqu Merge branch 'main' into vas-enc-dec-attn-refactor
613d7eaa
vasqu current status
6369055e
vasqu marian works now
66c93c14
vasqu fixing some copies
f8368cf1
vasqu some copy fixes + time series x informer
f34d11db
vasqu last models possibly and fixes on style/copies
b2a29872
vasqu Merge branch 'main' into vas-enc-dec-attn-refactor
dc180b25
vasqu some post merge fixes
917d3c9a
vasqu more fixes
776e3ca3
vasqu make attention interface callable and move warnings there
a27bfb9a
vasqu style lol
a066c856
vasqu add comment to "unsupported"
ece5b090
vasqu
ArthurZucker
ArthurZucker commented on 2025-05-21
vasqu remove callable interface and change interface warnings + some copies
ffdc5660
vasqu fix
63e38fa5
ArthurZucker
ArthurZucker commented on 2025-05-21
vasqu ternary is ugly af, make it simpler
c8e10d16
vasqu how did that happen
47425837
vasqu vasqu changed the title [`Attention`] Refactor Attention Interface for Bart-based Models and Enable Flex Attention πŸ”΄πŸ”΄πŸ”΄ [`Attention`] Refactor Attention Interface for Bart-based Models and Enable Flex Attention 213 days ago
vasqu
vasqu commented on 2025-05-21
vasqu fix flex attn test
b43f3fd9
vasqu failing the test
e8a91393
ArthurZucker
ArthurZucker approved these changes on 2025-05-21
vasqu no more fallback! fixing copies next
ab0754f7
vasqu style + attn fixed
c7c14997
vasqu fixing copies and mask creation
e62a8ace
vasqu wrong copy
cd39964e
vasqu fixup tests and disable flex attn for now
c450a3d2
vasqu vasqu changed the title πŸ”΄πŸ”΄πŸ”΄ [`Attention`] Refactor Attention Interface for Bart-based Models and Enable Flex Attention πŸ”΄πŸ”΄πŸ”΄ [`Attention`] Refactor Attention Interface for Bart-based Models 212 days ago
vasqu
vasqu Merge branch 'main' into vas-enc-dec-attn-refactor
aca05b79
vasqu fixup last tests?
22114cd8
vasqu
vasqu vasqu merged d95c864a into main 212 days ago
vasqu vasqu deleted the vas-enc-dec-attn-refactor branch 212 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone