Megatron-DeepSpeed
[MLM] Train script for non causal decoder
#300
Open

[MLM] Train script for non causal decoder #300

thomasw21 wants to merge 298 commits into main from thomas/mlm_train_script
thomasw21
made into input and output tokens
6ad61b6d
added eos
9131fdd7
added eos
cb76cd31
test text_token
531ee688
test text_token
a7d11583
test text_token
0008cfb0
test text_token
f1461a83
test text_token
ada0f100
assigned array
298c9b71
assigned array
d2bdff6e
assigned array
4ec8db32
hardcoded sequence length
10a2b6d1
check again
a373a700
lintangsutawika show sentinal tokens
bdef71b0
lintangsutawika show sentinal tokens
262fd6ce
lintangsutawika show sentinal tokens
68a6a936
lintangsutawika show sentinal tokens
1c00d4bb
lintangsutawika add more special tokens
8b85f113
lintangsutawika changed how mlm data is loaded
85d204af
lintangsutawika changed how mlm data is loaded
4c842748
lintangsutawika changed how mlm data is loaded
084245e5
lintangsutawika changed how mlm data is loaded
32af10e8
lintangsutawika changed how mlm data is loaded
b6e0e636
lintangsutawika added new script
2af2e4b7
lintangsutawika added new script
cc5968e5
lintangsutawika added new script
cf0b2a0f
lintangsutawika try t5 dataset
fc150a05
lintangsutawika try t5 dataset
039f90f2
lintangsutawika try t5 dataset
7364781e
lintangsutawika try t5 dataset
5b1100a4
lintangsutawika try t5 dataset
45102a93
lintangsutawika try t5 dataset
7b2ebbf7
lintangsutawika try t5 dataset
fe8b3dc0
lintangsutawika try t5 dataset
f456725e
lintangsutawika try t5 dataset
ae73d8cf
lintangsutawika try t5 dataset
fae6a0bd
lintangsutawika try t5 dataset
81858424
lintangsutawika try t5 dataset
9deef493
lintangsutawika developing
1e78a4bd
lintangsutawika developing
9070929d
lintangsutawika developing
56c69de0
lintangsutawika developing
d1ca9143
lintangsutawika developing
13af6234
lintangsutawika developing
dbc555e1
lintangsutawika developing
12b209dd
lintangsutawika test to see output of get_ltor_masks_and_position_ids
698eff05
lintangsutawika test to see output of get_ltor_masks_and_position_ids
dae3cc6c
add new script
5c109c3c
add new script
2fc99951
add new script
ee7af99a
changed settings
b6701a85
changed settings
2283e581
tidy up
9d00a49f
changed tokenizer and position embedding
0298fde9
modifying mlm to reflect original implementation
bde07f08
minor fix
4c0ca2e1
minor fix
0c05596d
minor fix
30f69248
minor fix
84408ef0
minor fix
ad964c58
minor fix
45899e98
minor fix
0b945972
minor fix
2b54cc17
minor fix
ec616272
minor fix
4448d1d3
minor fix
ecd148c7
minor fix
a99f30f0
minor fix
62d3e3e9
minor fix
a1608531
minor fix
fe205f77
minor fix
d39bdaf9
minor fix
2530d3e0
minor fix
5e93c47a
minor fix
ad867998
minor fix
82c8d932
minor fix
ebf3561d
minor fix
811f9755
minor fix
de7dfc83
minor fix
be2af770
minor fix
5e7e18f4
minor fix
24d4f25d
minor fix
5926be1c
minor fix
0f18174c
minor fix
58ce7144
set correct seq len
05470d7c
refined sampling method
51a23f23
refined sampling method
43cb2f04
refined sampling method
901defc8
refined sampling method
3130d7d1
refined sampling method
18eb53d7
refined sampling method
652c545c
first commit, adding non causal mlm dataset
5a49db8e
fixed mlm dataset
81b918c9
fixed mlm dataset
95afc4f0
fixed mlm dataset
c4514d8e
fixed mlm dataset
5cca5af4
fixed mlm dataset
ae958788
minor changes
a03e59f3
removed mlm related scripts
fa1e072d
removed any scipts not related to dataset, revert arguments
e3ce0a76
removed mlm related scripts
fa1e072d
added sampler and test
87e4055c
added testing data
0ae7661d
adapted test loader
71fb5aea
Update megatron/data/non_causal_mtf_dataset.py
be0cea2d
removed unused files
9daa3766
changed with impossible token
6b9e81a3
enable loading multiple indexed_dataset for each field
7feec27f
minor fix
f84f2935
data_prefix is set as dict
2778d8d8
removed sample_idx lines
61ac4b9d
change line from sample_idx to doc_idx
62e3fb13
replace shuffling _build_index_mappings with random.sample of the doc…
cb79f09e
minor changes
e9cf22a3
Muennighoff Cleanup artefacts
acd87cd5
Muennighoff Add packed preprocessing
019ed7c9
Muennighoff Use seq_length arg
7619f7a6
Muennighoff Add sources & docstrings
219209ac
added training process for t0
67424d6d
Update pretrain_t0.py
a7c424e6
thomasw21 Remove a bunch of code that's not needed
51d6c402
thomasw21 WIP
b4e374c4
thomasw21 Cleanup
0d2fdfd6
thomasw21 Add back all configs
126fa34c
thomasw21 Woops
83d24057
thomasw21 Fix tests
c93ed5ce
thomasw21 Rename testing files
528f5d34
thomasw21 Do in-place operations
8bed302d
thomasw21 Do in-place operations
bd2fede1
thomasw21 Woops
8593e425
thomasw21 Fix typo
a1eb558a
thomasw21 Add test that packing is done optimially via greedy algorithm
3bddafa8
thomasw21 Woops
45c94446
lintangsutawika added capabilities for padding and prefix lm index
6f28ae45
added adjustments and new dataset
8a4d99b7
added sentinal tokens
ea445b15
made into input and output tokens
40708595
modifying mlm to reflect original implementation
85e84ecb
minor fix
39222938
added sampler and test
ee6438f1
Muennighoff Enable training
a869adf5
Muennighoff Add T0 training test
5ae15ef6
Muennighoff Remove artefacts
efa55ea8
Muennighoff Remove artefacts
f45266d1
thomasw21 WIP
8029564f
thomasw21 WIP
4faa7434
thomasw21 WIP
3a6d73d1
thomasw21 WIP
ea86bc8f
thomasw21 WIP
638fc567
thomasw21 move to cpu for comparison
66d2afe8
thomasw21 Use torch_assert_equal
3794b86a
thomasw21 WIP
346b08f9
thomasw21 Take in account pad + fix inverse
4203f6cb
thomasw21 Tensor and int can't be compared vi torch_assert_equal
bcba2b71
thomasw21 Woops
57156e1d
thomasw21 Test
45d92189
thomasw21 Woops
959fc71d
thomasw21 Remove unecessary unsqueeze
27197fce
thomasw21 Add necessary unsqueeze
b7374e1c
thomasw21 I'm stupid
4f6b7d32
thomasw21 I'm stupid
960b17cb
thomasw21 Tokenizers returns None when trying to access a non existing value
2b522d11
thomasw21 Force gpt2 to have a pad token
a8fcd386
thomasw21 Add a test that the packed_masking works in the modeling side
7181de45
thomasw21 Import error
172306b0
thomasw21 Tokenizer requires to have pad token
a4854bd2
thomasw21 Turns out that test_model.py did not use deepspeed version of models
06c29a9a
thomasw21 Use train_batch instead
aba48b3f
thomasw21 Make it work via DS
a9d423a4
thomasw21 Make it work via DS
6a95e25e
thomasw21 Make it work via DS
d6e435b1
thomasw21 Make it work via DS
ca8c04a7
thomasw21 Make it work via DS
f3231db3
thomasw21 Make it work via DS
987e6b4b
thomasw21 Make it work via DS
0b27fb67
thomasw21 Woops
1ba5d4a1
thomasw21 Make it work via DS
cbab16ca
thomasw21 Make it work via DS
4defbb2c
thomasw21 Make it work via DS
412939c0
thomasw21 Maybe
17a6cc0a
thomasw21 Make it work via DS
cb90679e
thomasw21 Woops
bd4a3f07
thomasw21 Try having very strict mask
66040354
thomasw21 Try updating the kernel
d98e39a5
thomasw21 Try updating the kernel
84950834
thomasw21 Try updating the kernel
ef5d4d4d
thomasw21 Try updating the kernel
69912b3f
thomasw21 Try updating the kernel
866fc56e
thomasw21 Try updating the kernel
8e9701b3
thomasw21 Inverse causal masking
15d95faf
thomasw21 Check that the padding are ignored
fe4f806c
thomasw21 Fix test
cc2aff57
thomasw21 Probably should be in this order:
93cde870
thomasw21 Revert "Probably should be in this order:"
f6d717b4
thomasw21 Add a test checking that ScaledMaskedSoftmax custom kernel does what …
910f93b9
thomasw21 Head specific mask is not implemented
75f99ef7
thomasw21 Test something out
c34f1073
thomasw21 Test something out
ed6131aa
thomasw21 Test something out
3a846a0a
thomasw21 Test something out
5746641e
thomasw21 Test something out
292620c4
thomasw21 Test something out
0e1ef5dc
thomasw21 Test something out
964a275f
thomasw21 Test something out
8b31e9ca
thomasw21 Test something out
723a5b39
thomasw21 Test something out
65b4ea28
thomasw21 Maybe nothing is wrong
7eaced45
thomasw21 Woops
da9f3160
thomasw21 Use bloom instead
8b67bd98
thomasw21 Make MTF dataloader an infinite dataloader
84007bc2
thomasw21 Work into moving packing logic into a dataset
273d420b
thomasw21 Woops
688d06e4
thomasw21 Woops
ddc6a61a
thomasw21 Woops
0e34e8d1
thomasw21 Woops
014b8b82
thomasw21 Woops
c53622a9
thomasw21 Woops
ea221a88
thomasw21 Woops
32749863
thomasw21 Woops
9a5bf96d
thomasw21 Woops
d1605898
thomasw21 Woops
c3ab5b95
thomasw21 Woops
f5410765
thomasw21 Requires to remember how may epochs
20be5b90
thomasw21 Find a way to reset states everytime
d9719b6d
thomasw21 Find a way to reset states everytime
4e0c4caf
thomasw21 Find a way to reset states everytime
48a55b9a
thomasw21 Find a way to reset states everytime
2e469e5a
thomasw21 Find a way to reset states everytime
74e03ec4
thomasw21 Fix bugs
f4a4733e
thomasw21 Cleanup
e1a37677
thomasw21 Merge remote-tracking branch 'official_repo/main' into thomas/mtf_tra…
efeb55a1
thomasw21 Woops
de88ab63
thomasw21 Woops
d7a6388a
thomasw21 Woops
1c2284f1
thomasw21 Woops
b759a92a
thomasw21 Woops
ef20e57a
thomasw21 Silently skip samples that are too long
5816adfb
thomasw21 Build the index from scratch everytime
37ad57e6
thomasw21 Prevent empty dataset
1572ddc9
thomasw21 Change the condition for empty slice
bebb481a
thomasw21 PR reviews
5c806992
thomasw21 Revert back changes linked to shutil.copytree
985cd028
thomasw21 Get test working
41e931a9
thomasw21 Woops
b321a349
thomasw21 Woops
0450bad8
thomasw21 Fix empty samples
de4934f5
thomasw21 Cuda kernel is not strictly equivalent
e3e21f55
thomasw21 Update tests/test_model.py
16c556c0
thomasw21 MTF optimize dataloading (#298)
f2df7715
thomasw21 Get pretrain on non causal mlm script
a45c9cd4
thomasw21 Test
606fdeb5
Base automatically changed from thomas/mtf_train_script to main 3 years ago

Login to write a write a comment.

Login via GitHub

Reviewers
No reviews
Assignees
No one assigned
Labels
Milestone