π΄π΄π΄ [`Attention`] Refactor Attention Interface for Bart-based Models #38108
starting attn refactor for encoder decoder models via bart (eager + sβ¦
5e7df835
flash attention works, remove unnecessary code
faf79141
flex attention support for bart!, gotta check if the renaming is not β¦
90a90d3f
some comments
259258d4
skip flex grad test for standalone as done with the other test
afb29345
revert flex attn rename (for now), sdpa simplify, and todos
25db34af
more todos
131de1b0
refactor mask creation for reuse
c8b8ed6c
modular attempt at biogpt
c0d83b64
first batch of other models
59cf07d9
fix attn dropout
146b02b6
fix autoformer copies
b7f0a2bd
hubert
00c27dfb
another batch of models
fc41dc20
copies/style + last round of bart models --> whisper next?
1e2b4f02
remove unnecessary _reshape function and remove copy to whisper
dccabeb4
add skip for decoder-only models out of enc-dec (same as in bart)
cecd0a41
bring back licences
ac61dd79
remove comment, added to pr read instead
a6e848d1
vasqu
changed the title [`Attention`] Refactor Attention Interface and Enable Flex Attention [`Attention`] Refactor Attention Interface for Bart-based Modelsand Enable Flex Attention 220 days ago
vasqu
changed the title [`Attention`] Refactor Attention Interface for Bart-based Modelsand Enable Flex Attention [`Attention`] Refactor Attention Interface for Bart-based Models and Enable Flex Attention 220 days ago
mostly docs
ddfc515e
disable sew flex attn as it's unclear attn mask for now
3e5da386
oops
9a9b1404
test fixes for enc-dec
aecd5e2b
torch fx fixes + try at flex attn
7bdb6920
skip on mbart
f8260e67
some more fixes
598a5669
musicgen skip / delete old attn class logic + sdpa compose compile skip
61b648f1
disable flex attn for musicgen, not worth the effort
43169910
more fixes and style
69371062
flex attention test for dropout and encoder decoder that dont have maβ¦
4f123474
informer fixes
05e38b12
the weirdest thing I've encountered yet...
9a8d4e47
style
2055759f
remove empty tensor attempt, found core root in previous commits
adc808d2
disable time series due to tests being very text centric on inputs
3d23455a
add speech to text to be ignoring the other attns, also due to tests
8f9de868
update docs
b94c9664
remaining issues resolved ?
6f813cd1
update docs for current state --> nllb moe and pegasus x sdpa is quesβ¦
3be2a9d4
vasqu
marked this pull request as ready for review 218 days ago
some models have not set the is_causal flag...
6dbd77ad
change dtype in softmax tol old behaviour + some modular fixes
dd3d3077
I hate it but it is what it is
d77ea86f
fixes from main for bart
71f7f1b2
forgot this one
8a43566d
some model fixes
270c42ab
style
cc6cae0d
Merge branch 'main' into vas-enc-dec-attn-refactor
613d7eaa
current status
6369055e
marian works now
66c93c14
fixing some copies
f8368cf1
some copy fixes + time series x informer
f34d11db
last models possibly and fixes on style/copies
b2a29872
Merge branch 'main' into vas-enc-dec-attn-refactor
dc180b25
some post merge fixes
917d3c9a
more fixes
776e3ca3
make attention interface callable and move warnings there
a27bfb9a
style lol
a066c856
add comment to "unsupported"
ece5b090
remove callable interface and change interface warnings + some copies
ffdc5660
fix
63e38fa5
ternary is ugly af, make it simpler
c8e10d16
how did that happen
47425837
vasqu
changed the title [`Attention`] Refactor Attention Interface for Bart-based Models and Enable Flex Attention π΄π΄π΄ [`Attention`] Refactor Attention Interface for Bart-based Models and Enable Flex Attention 213 days ago
vasqu
commented
on 2025-05-21
fix flex attn test
b43f3fd9
failing the test
e8a91393
no more fallback! fixing copies next
ab0754f7
style + attn fixed
c7c14997
fixing copies and mask creation
e62a8ace
wrong copy
cd39964e
fixup tests and disable flex attn for now
c450a3d2
vasqu
changed the title π΄π΄π΄ [`Attention`] Refactor Attention Interface for Bart-based Models and Enable Flex Attention π΄π΄π΄ [`Attention`] Refactor Attention Interface for Bart-based Models 212 days ago
Merge branch 'main' into vas-enc-dec-attn-refactor
aca05b79
fixup last tests?
22114cd8
vasqu
merged
d95c864a
into main 212 days ago
vasqu
deleted the vas-enc-dec-attn-refactor branch 212 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub