PR #1675 Add support for Flex Attention

adding flex attention

f4165392

registrifying score mods

ac3a8843

registrifying attention mask mods

31b27e23

Merge branch 'mosaicml:main' into shashank/flexattention

c8fffa54

bug_fix

86dce3b8

bug_fix

cb8f4a6a

lint

902850a7

configuring test

9c9708d8

configuring tests

f1ff4308

bug fix

e537f5a1

fixing alibi

c527dd71

Merge branch 'mosaicml:main' into shashank/flexattention

15e303e6

configuring further tests

c4ef5d9e

refactoring

6b374271

adding warnings and errors

e30fe7a5

gating tests on torch version

924a53c3

Merge branch 'mosaicml:main' into shashank/flexattention

57048e33

reorganizing function defs

67a2aeae

refactoring

04f3a629

passing in dicts of mask and score mods

ab6c58c0

making mask and score mods configurable via yaml

3b3827d8

Merge branch 'mosaicml:main' into shashank/flexattention

be43e8d1

adding torch.compile

2264f91e

..

e274d9f8

..

a26bb4f8

undoing comment out

d5ab7d35

Merge branch 'mosaicml:main' into shashank/flexattention

d40e978d

adding torch comile

5f13e7be

temporary commit commenting out block mask and score mod

ca8e1738

undoing prev temp commit

f5486ff0

Merge branch 'mosaicml:main' into shashank/flexattention

fdced3a9

speeding up block mask generation

c53db63f

precompilining create block mask

ec5900df

minor

02ad3b6c

compiling mask and flex attn once for the entire model

13a5fc8c

..

2ae60274

..

0c5150a6

making sequence id transforms configurable

ff28304e

..

23ba20f4

..

72c45ae6

..

73066a45

..

9f616f77

converting mods from dict to list

94ecade5

switching off seq id masking if configured so

4b301302

fix bug

9daf0680

fix bug

67aa9001

adding global and local window mask

65a0425a

..

3443b697

ShashankMosaicML changed the title ~~[WIP]: Shashank/flexattention~~ [WIP]: Add support for Flex Attention 1 year ago

fixing test

f6b3705b

..

d5ff1386

flex attn softcap only int

43cb0d1c

..

f623a1f0

..

0fea56a7

..

eb6e7924

..

5852da08

Merge branch 'mosaicml:main' into shashank/flexattention

615a9044

simplifying design

04740270

removing check_seq_id_attn_mask

70aa0c76

..

fc8a1202

..

5f880939

fixing tests

661f7f61

..

fef3a5d4

..

f6c66e81

ShashankMosaicML changed the title ~~[WIP]: Add support for Flex Attention~~ Add support for Flex Attention 1 year ago

Merge branch 'main' into shashank/flexattention

eacca42a

allowing block overrides for flex attention

4385f18c

..

e17d1ff8

configuring tests, fixing bugs

58760fcf

fixing bug when using past kv caches

f4ad493f

bug fix

67f9aae6

..

5fcbc182

..

8dfdedb2

fixing score mod bug

8912cb26

..

18c4bb9b

..

bf1cb6c7

..

5093efd0

..

96b8f82e

..

f1ad991e

..

18afcc5f

configuring with torch 2.5.1 and 2.6.0.dev

434aa83e

configuring more tests with torch 2.5.1 and 2.6.0.dev

216fcb90

..

438e0f36

..

2bb25ee5

..

9831b5eb

..

ad601e47

..

77115c51

figuring out d_model and seq lengths for which flex attention works

dfde51bd

adding todos

d1d04cee

Merge branch 'main' into shashank/flexattention

5eca05fb

adding test for local global attention

718d89de

Merge branch 'main' into shashank/flexattention

135abd73

Merge branch 'main' into shashank/flexattention

369e818c

Merge branch 'main' into shashank/flexattention

8a62ca40

Merge branch 'main' into shashank/flexattention

5d67b9cb

Merge branch 'main' into shashank/flexattention

05cf0438

..

e221c320

..

4bc6f7cb

..

45fc516d

..

70f928a8

Merge branch 'main' into shashank/flexattention

397ca38f

Merge branch 'main' into shashank/flexattention

8f276d0e

llm-foundry
Add support for Flex Attention
#1675

Open

Add support for Flex Attention #1675

llm-foundry Add support for Flex Attention #1675 Open

Add support for Flex Attention #1675

llm-foundry
Add support for Flex Attention
#1675

Open