llm-foundry
Add support for Flex Attention
#1675
Open

Add support for Flex Attention #1675

ShashankMosaicML
ShashankMosaicML adding flex attention
f4165392
ShashankMosaicML registrifying score mods
ac3a8843
ShashankMosaicML registrifying attention mask mods
31b27e23
ShashankMosaicML Merge branch 'mosaicml:main' into shashank/flexattention
c8fffa54
ShashankMosaicML bug_fix
86dce3b8
ShashankMosaicML bug_fix
cb8f4a6a
ShashankMosaicML lint
902850a7
ShashankMosaicML configuring test
9c9708d8
ShashankMosaicML configuring tests
f1ff4308
ShashankMosaicML bug fix
e537f5a1
ShashankMosaicML fixing alibi
c527dd71
ShashankMosaicML Merge branch 'mosaicml:main' into shashank/flexattention
15e303e6
ShashankMosaicML configuring further tests
c4ef5d9e
ShashankMosaicML refactoring
6b374271
ShashankMosaicML adding warnings and errors
e30fe7a5
ShashankMosaicML gating tests on torch version
924a53c3
ShashankMosaicML Merge branch 'mosaicml:main' into shashank/flexattention
57048e33
ShashankMosaicML reorganizing function defs
67a2aeae
ShashankMosaicML refactoring
04f3a629
ShashankMosaicML passing in dicts of mask and score mods
ab6c58c0
ShashankMosaicML making mask and score mods configurable via yaml
3b3827d8
ShashankMosaicML Merge branch 'mosaicml:main' into shashank/flexattention
be43e8d1
ShashankMosaicML adding torch.compile
2264f91e
ShashankMosaicML ..
e274d9f8
ShashankMosaicML ..
a26bb4f8
ShashankMosaicML undoing comment out
d5ab7d35
ShashankMosaicML Merge branch 'mosaicml:main' into shashank/flexattention
d40e978d
ShashankMosaicML adding torch comile
5f13e7be
ShashankMosaicML temporary commit commenting out block mask and score mod
ca8e1738
ShashankMosaicML undoing prev temp commit
f5486ff0
ShashankMosaicML Merge branch 'mosaicml:main' into shashank/flexattention
fdced3a9
ShashankMosaicML speeding up block mask generation
c53db63f
ShashankMosaicML precompilining create block mask
ec5900df
ShashankMosaicML minor
02ad3b6c
ShashankMosaicML compiling mask and flex attn once for the entire model
13a5fc8c
ShashankMosaicML ..
2ae60274
ShashankMosaicML ..
0c5150a6
ShashankMosaicML making sequence id transforms configurable
ff28304e
ShashankMosaicML ..
23ba20f4
ShashankMosaicML ..
72c45ae6
ShashankMosaicML ..
73066a45
ShashankMosaicML ..
9f616f77
ShashankMosaicML converting mods from dict to list
94ecade5
ShashankMosaicML switching off seq id masking if configured so
4b301302
ShashankMosaicML fix bug
9daf0680
ShashankMosaicML fix bug
67aa9001
ShashankMosaicML adding global and local window mask
65a0425a
ShashankMosaicML ..
3443b697
ShashankMosaicML ShashankMosaicML changed the title [WIP]: Shashank/flexattention [WIP]: Add support for Flex Attention 1 year ago
ShashankMosaicML fixing test
f6b3705b
ShashankMosaicML ..
d5ff1386
ShashankMosaicML flex attn softcap only int
43cb0d1c
ShashankMosaicML ..
f623a1f0
ShashankMosaicML ..
0fea56a7
ShashankMosaicML ..
eb6e7924
ShashankMosaicML ..
5852da08
ShashankMosaicML Merge branch 'mosaicml:main' into shashank/flexattention
615a9044
ShashankMosaicML simplifying design
04740270
ShashankMosaicML removing check_seq_id_attn_mask
70aa0c76
ShashankMosaicML ..
fc8a1202
ShashankMosaicML ..
5f880939
ShashankMosaicML fixing tests
661f7f61
ShashankMosaicML ..
fef3a5d4
ShashankMosaicML ..
f6c66e81
ShashankMosaicML ShashankMosaicML changed the title [WIP]: Add support for Flex Attention Add support for Flex Attention 1 year ago
ShashankMosaicML Merge branch 'main' into shashank/flexattention
eacca42a
ShashankMosaicML allowing block overrides for flex attention
4385f18c
ShashankMosaicML ..
e17d1ff8
ShashankMosaicML configuring tests, fixing bugs
58760fcf
ShashankMosaicML fixing bug when using past kv caches
f4ad493f
ShashankMosaicML bug fix
67f9aae6
ShashankMosaicML ..
5fcbc182
ShashankMosaicML ..
8dfdedb2
ShashankMosaicML fixing score mod bug
8912cb26
ShashankMosaicML ..
18c4bb9b
ShashankMosaicML ..
bf1cb6c7
ShashankMosaicML ..
5093efd0
ShashankMosaicML ..
96b8f82e
ShashankMosaicML ..
f1ad991e
ShashankMosaicML ..
18afcc5f
ShashankMosaicML configuring with torch 2.5.1 and 2.6.0.dev
434aa83e
ShashankMosaicML configuring more tests with torch 2.5.1 and 2.6.0.dev
216fcb90
ShashankMosaicML ..
438e0f36
ShashankMosaicML ..
2bb25ee5
ShashankMosaicML ..
9831b5eb
ShashankMosaicML ..
ad601e47
ShashankMosaicML ..
77115c51
ShashankMosaicML figuring out d_model and seq lengths for which flex attention works
dfde51bd
ShashankMosaicML adding todos
d1d04cee
ShashankMosaicML Merge branch 'main' into shashank/flexattention
5eca05fb
ShashankMosaicML adding test for local global attention
718d89de
ShashankMosaicML Merge branch 'main' into shashank/flexattention
135abd73
ShashankMosaicML Merge branch 'main' into shashank/flexattention
369e818c
ShashankMosaicML Merge branch 'main' into shashank/flexattention
8a62ca40
ShashankMosaicML Merge branch 'main' into shashank/flexattention
5d67b9cb
ShashankMosaicML Merge branch 'main' into shashank/flexattention
05cf0438
ShashankMosaicML ..
e221c320
ShashankMosaicML ..
4bc6f7cb
ShashankMosaicML ..
45fc516d
ShashankMosaicML ..
70f928a8
ShashankMosaicML Merge branch 'main' into shashank/flexattention
397ca38f
ShashankMosaicML Merge branch 'main' into shashank/flexattention
8f276d0e

Login to write a write a comment.

Login via GitHub

Reviewers
No reviews
Assignees
No one assigned
Labels
Milestone