Megatron-DeepSpeed
[MLM] Train script for non causal decoder
#300
Open
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
298
Changes
View On
GitHub
[MLM] Train script for non causal decoder
#300
thomasw21
wants to merge 298 commits into
main
from
thomas/mlm_train_script
made into input and output tokens
6ad61b6d
added eos
9131fdd7
added eos
cb76cd31
test text_token
531ee688
test text_token
a7d11583
test text_token
0008cfb0
test text_token
f1461a83
test text_token
ada0f100
assigned array
298c9b71
assigned array
d2bdff6e
assigned array
4ec8db32
hardcoded sequence length
10a2b6d1
check again
a373a700
show sentinal tokens
bdef71b0
show sentinal tokens
262fd6ce
show sentinal tokens
68a6a936
show sentinal tokens
1c00d4bb
add more special tokens
8b85f113
changed how mlm data is loaded
85d204af
changed how mlm data is loaded
4c842748
changed how mlm data is loaded
084245e5
changed how mlm data is loaded
32af10e8
changed how mlm data is loaded
b6e0e636
added new script
2af2e4b7
added new script
cc5968e5
added new script
cf0b2a0f
try t5 dataset
fc150a05
try t5 dataset
039f90f2
try t5 dataset
7364781e
try t5 dataset
5b1100a4
try t5 dataset
45102a93
try t5 dataset
7b2ebbf7
try t5 dataset
fe8b3dc0
try t5 dataset
f456725e
try t5 dataset
ae73d8cf
try t5 dataset
fae6a0bd
try t5 dataset
81858424
try t5 dataset
9deef493
developing
1e78a4bd
developing
9070929d
developing
56c69de0
developing
d1ca9143
developing
13af6234
developing
dbc555e1
developing
12b209dd
test to see output of get_ltor_masks_and_position_ids
698eff05
test to see output of get_ltor_masks_and_position_ids
dae3cc6c
add new script
5c109c3c
add new script
2fc99951
add new script
ee7af99a
changed settings
b6701a85
changed settings
2283e581
tidy up
9d00a49f
changed tokenizer and position embedding
0298fde9
modifying mlm to reflect original implementation
bde07f08
minor fix
4c0ca2e1
minor fix
0c05596d
minor fix
30f69248
minor fix
84408ef0
minor fix
ad964c58
minor fix
45899e98
minor fix
0b945972
minor fix
2b54cc17
minor fix
ec616272
minor fix
4448d1d3
minor fix
ecd148c7
minor fix
a99f30f0
minor fix
62d3e3e9
minor fix
a1608531
minor fix
fe205f77
minor fix
d39bdaf9
minor fix
2530d3e0
minor fix
5e93c47a
minor fix
ad867998
minor fix
82c8d932
minor fix
ebf3561d
minor fix
811f9755
minor fix
de7dfc83
minor fix
be2af770
minor fix
5e7e18f4
minor fix
24d4f25d
minor fix
5926be1c
minor fix
0f18174c
minor fix
58ce7144
set correct seq len
05470d7c
refined sampling method
51a23f23
refined sampling method
43cb2f04
refined sampling method
901defc8
refined sampling method
3130d7d1
refined sampling method
18eb53d7
refined sampling method
652c545c
first commit, adding non causal mlm dataset
5a49db8e
fixed mlm dataset
81b918c9
fixed mlm dataset
95afc4f0
fixed mlm dataset
c4514d8e
fixed mlm dataset
5cca5af4
fixed mlm dataset
ae958788
minor changes
a03e59f3
removed mlm related scripts
fa1e072d
removed any scipts not related to dataset, revert arguments
e3ce0a76
removed mlm related scripts
fa1e072d
added sampler and test
87e4055c
added testing data
0ae7661d
adapted test loader
71fb5aea
Update megatron/data/non_causal_mtf_dataset.py
be0cea2d
removed unused files
9daa3766
changed with impossible token
6b9e81a3
enable loading multiple indexed_dataset for each field
7feec27f
minor fix
f84f2935
data_prefix is set as dict
2778d8d8
removed sample_idx lines
61ac4b9d
change line from sample_idx to doc_idx
62e3fb13
replace shuffling _build_index_mappings with random.sample of the doc…
cb79f09e
minor changes
e9cf22a3
Cleanup artefacts
acd87cd5
Add packed preprocessing
019ed7c9
Use seq_length arg
7619f7a6
Add sources & docstrings
219209ac
added training process for t0
67424d6d
Update pretrain_t0.py
a7c424e6
Remove a bunch of code that's not needed
51d6c402
WIP
b4e374c4
Cleanup
0d2fdfd6
Add back all configs
126fa34c
Woops
83d24057
Fix tests
c93ed5ce
Rename testing files
528f5d34
Do in-place operations
8bed302d
Do in-place operations
bd2fede1
Woops
8593e425
Fix typo
a1eb558a
Add test that packing is done optimially via greedy algorithm
3bddafa8
Woops
45c94446
added capabilities for padding and prefix lm index
6f28ae45
added adjustments and new dataset
8a4d99b7
added sentinal tokens
ea445b15
made into input and output tokens
40708595
modifying mlm to reflect original implementation
85e84ecb
minor fix
39222938
added sampler and test
ee6438f1
Enable training
a869adf5
Add T0 training test
5ae15ef6
Remove artefacts
efa55ea8
Remove artefacts
f45266d1
WIP
8029564f
WIP
4faa7434
WIP
3a6d73d1
WIP
ea86bc8f
WIP
638fc567
move to cpu for comparison
66d2afe8
Use torch_assert_equal
3794b86a
WIP
346b08f9
Take in account pad + fix inverse
4203f6cb
Tensor and int can't be compared vi torch_assert_equal
bcba2b71
Woops
57156e1d
Test
45d92189
Woops
959fc71d
Remove unecessary unsqueeze
27197fce
Add necessary unsqueeze
b7374e1c
I'm stupid
4f6b7d32
I'm stupid
960b17cb
Tokenizers returns None when trying to access a non existing value
2b522d11
Force gpt2 to have a pad token
a8fcd386
Add a test that the packed_masking works in the modeling side
7181de45
Import error
172306b0
Tokenizer requires to have pad token
a4854bd2
Turns out that test_model.py did not use deepspeed version of models
06c29a9a
Use train_batch instead
aba48b3f
Make it work via DS
a9d423a4
Make it work via DS
6a95e25e
Make it work via DS
d6e435b1
Make it work via DS
ca8c04a7
Make it work via DS
f3231db3
Make it work via DS
987e6b4b
Make it work via DS
0b27fb67
Woops
1ba5d4a1
Make it work via DS
cbab16ca
Make it work via DS
4defbb2c
Make it work via DS
412939c0
Maybe
17a6cc0a
Make it work via DS
cb90679e
Woops
bd4a3f07
Try having very strict mask
66040354
Try updating the kernel
d98e39a5
Try updating the kernel
84950834
Try updating the kernel
ef5d4d4d
Try updating the kernel
69912b3f
Try updating the kernel
866fc56e
Try updating the kernel
8e9701b3
Inverse causal masking
15d95faf
Check that the padding are ignored
fe4f806c
Fix test
cc2aff57
Probably should be in this order:
93cde870
Revert "Probably should be in this order:"
f6d717b4
Add a test checking that ScaledMaskedSoftmax custom kernel does what …
910f93b9
Head specific mask is not implemented
75f99ef7
Test something out
c34f1073
Test something out
ed6131aa
Test something out
3a846a0a
Test something out
5746641e
Test something out
292620c4
Test something out
0e1ef5dc
Test something out
964a275f
Test something out
8b31e9ca
Test something out
723a5b39
Test something out
65b4ea28
Maybe nothing is wrong
7eaced45
Woops
da9f3160
Use bloom instead
8b67bd98
Make MTF dataloader an infinite dataloader
84007bc2
Work into moving packing logic into a dataset
273d420b
Woops
688d06e4
Woops
ddc6a61a
Woops
0e34e8d1
Woops
014b8b82
Woops
c53622a9
Woops
ea221a88
Woops
32749863
Woops
9a5bf96d
Woops
d1605898
Woops
c3ab5b95
Woops
f5410765
Requires to remember how may epochs
20be5b90
Find a way to reset states everytime
d9719b6d
Find a way to reset states everytime
4e0c4caf
Find a way to reset states everytime
48a55b9a
Find a way to reset states everytime
2e469e5a
Find a way to reset states everytime
74e03ec4
Fix bugs
f4a4733e
Cleanup
e1a37677
Merge remote-tracking branch 'official_repo/main' into thomas/mtf_tra…
efeb55a1
Woops
de88ab63
Woops
d7a6388a
Woops
1c2284f1
Woops
b759a92a
Woops
ef20e57a
Silently skip samples that are too long
5816adfb
Build the index from scratch everytime
37ad57e6
Prevent empty dataset
1572ddc9
Change the condition for empty slice
bebb481a
PR reviews
5c806992
Revert back changes linked to shutil.copytree
985cd028
Get test working
41e931a9
Woops
b321a349
Woops
0450bad8
Fix empty samples
de4934f5
Cuda kernel is not strictly equivalent
e3e21f55
Update tests/test_model.py
16c556c0
MTF optimize dataloading (#298)
f2df7715
Get pretrain on non causal mlm script
a45c9cd4
Test
606fdeb5
Base automatically changed from
thomas/mtf_train_script
to
main
3 years ago
Login to write a write a comment.
Login via GitHub
Reviewers
No reviews
Assignees
No one assigned
Labels
None yet
Milestone
No milestone
Login to write a write a comment.
Login via GitHub