Megatron-DeepSpeed
58d92042 - add `pad-vocab-size-to` argument and tests (#255)

Commit
3 years ago
add `pad-vocab-size-to` argument and tests (#255) * init new test * test pad vocab size to * add logs * log to warning * change TP * fix loop * revert * remove hack size * this new test should pass * test not divisible by num tp * Revert "remove hack size" This reverts commit bcc6d8de26682c7ea077f6892203411c5b790540. * Revert "Revert "remove hack size"" This reverts commit 8322f897b5f4456273f50f5617e8a96cc2abbd64. * Revert "test not divisible by num tp" This reverts commit 92614bf7cab69e1ac2fdc4e7167c3813676cbddd. * Revert "this new test should pass" This reverts commit 9e17a4f71a9c7257ef809ba53cf9955738acb3d6. * change info to warning * change to print * add print * test 2 * new print * woups * more * woups * comment * raise errors * woups * pad to save vocab size * simplify test * assert test raised * print error msg * check msg error * check error * woups * clean * simplify * remove unused print * add comment * add test multiple of tp size * add print * add check * clean * Update megatron/mpu/layers.py Co-authored-by: Thomas Wang <24695242+thomasw21@users.noreply.github.com> * Update megatron/tokenizer/tokenizer.py Co-authored-by: Thomas Wang <24695242+thomasw21@users.noreply.github.com> * chnage micro-batch-size * use tiny vocab * fix data dir * fix arg * change micro-batch-size * adept input ids * assertIn * change micro batch size * Fix test TP Co-authored-by: Thomas Wang <24695242+thomasw21@users.noreply.github.com> * unused var * add test make_vocab_size_divisible_by * fix test_tokenizer_vocab_size_multiple_of_tp_size test * Fix padded vocab size on preprocessing scripts (#257) * Add tokenizer options in preprocessing scripts * This should fix the TP issue? Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com> * documentation Co-authored-by: Thomas Wang <24695242+thomasw21@users.noreply.github.com>
Author
Parents
Loading