Add support for Flex Attention #1675
adding flex attention
f4165392
registrifying score mods
ac3a8843
registrifying attention mask mods
31b27e23
Merge branch 'mosaicml:main' into shashank/flexattention
c8fffa54
bug_fix
86dce3b8
bug_fix
cb8f4a6a
lint
902850a7
configuring test
9c9708d8
configuring tests
f1ff4308
bug fix
e537f5a1
fixing alibi
c527dd71
Merge branch 'mosaicml:main' into shashank/flexattention
15e303e6
configuring further tests
c4ef5d9e
refactoring
6b374271
adding warnings and errors
e30fe7a5
gating tests on torch version
924a53c3
Merge branch 'mosaicml:main' into shashank/flexattention
57048e33
reorganizing function defs
67a2aeae
refactoring
04f3a629
passing in dicts of mask and score mods
ab6c58c0
making mask and score mods configurable via yaml
3b3827d8
Merge branch 'mosaicml:main' into shashank/flexattention
be43e8d1
adding torch.compile
2264f91e
..
e274d9f8
..
a26bb4f8
undoing comment out
d5ab7d35
Merge branch 'mosaicml:main' into shashank/flexattention
d40e978d
adding torch comile
5f13e7be
temporary commit commenting out block mask and score mod
ca8e1738
undoing prev temp commit
f5486ff0
Merge branch 'mosaicml:main' into shashank/flexattention
fdced3a9
speeding up block mask generation
c53db63f
precompilining create block mask
ec5900df
minor
02ad3b6c
compiling mask and flex attn once for the entire model
13a5fc8c
..
2ae60274
..
0c5150a6
making sequence id transforms configurable
ff28304e
..
23ba20f4
..
72c45ae6
..
73066a45
..
9f616f77
converting mods from dict to list
94ecade5
switching off seq id masking if configured so
4b301302
fix bug
9daf0680
fix bug
67aa9001
adding global and local window mask
65a0425a
..
3443b697
ShashankMosaicML
changed the title [WIP]: Shashank/flexattention [WIP]: Add support for Flex Attention 1 year ago
fixing test
f6b3705b
..
d5ff1386
flex attn softcap only int
43cb0d1c
..
f623a1f0
..
0fea56a7
..
eb6e7924
..
5852da08
Merge branch 'mosaicml:main' into shashank/flexattention
615a9044
simplifying design
04740270
removing check_seq_id_attn_mask
70aa0c76
..
fc8a1202
..
5f880939
fixing tests
661f7f61
..
fef3a5d4
..
f6c66e81
ShashankMosaicML
changed the title [WIP]: Add support for Flex Attention Add support for Flex Attention 1 year ago
Merge branch 'main' into shashank/flexattention
eacca42a
allowing block overrides for flex attention
4385f18c
..
e17d1ff8
configuring tests, fixing bugs
58760fcf
fixing bug when using past kv caches
f4ad493f
bug fix
67f9aae6
..
5fcbc182
..
8dfdedb2
fixing score mod bug
8912cb26
..
18c4bb9b
..
bf1cb6c7
..
5093efd0
..
96b8f82e
..
f1ad991e
..
18afcc5f
configuring with torch 2.5.1 and 2.6.0.dev
434aa83e
configuring more tests with torch 2.5.1 and 2.6.0.dev
216fcb90
..
438e0f36
..
2bb25ee5
..
9831b5eb
..
ad601e47
..
77115c51
figuring out d_model and seq lengths for which flex attention works
dfde51bd
adding todos
d1d04cee
Merge branch 'main' into shashank/flexattention
5eca05fb
adding test for local global attention
718d89de
Merge branch 'main' into shashank/flexattention
135abd73
Merge branch 'main' into shashank/flexattention
369e818c
Merge branch 'main' into shashank/flexattention
8a62ca40
Merge branch 'main' into shashank/flexattention
5d67b9cb
Merge branch 'main' into shashank/flexattention
05cf0438
..
e221c320
..
4bc6f7cb
..
45fc516d
..
70f928a8
Merge branch 'main' into shashank/flexattention
397ca38f
Merge branch 'main' into shashank/flexattention
8f276d0e
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub