add `pad-vocab-size-to` argument and tests (#255)
* init new test
* test pad vocab size to
* add logs
* log to warning
* change TP
* fix loop
* revert
* remove hack size
* this new test should pass
* test not divisible by num tp
* Revert "remove hack size"
This reverts commit bcc6d8de26682c7ea077f6892203411c5b790540.
* Revert "Revert "remove hack size""
This reverts commit 8322f897b5f4456273f50f5617e8a96cc2abbd64.
* Revert "test not divisible by num tp"
This reverts commit 92614bf7cab69e1ac2fdc4e7167c3813676cbddd.
* Revert "this new test should pass"
This reverts commit 9e17a4f71a9c7257ef809ba53cf9955738acb3d6.
* change info to warning
* change to print
* add print
* test 2
* new print
* woups
* more
* woups
* comment
* raise errors
* woups
* pad to save vocab size
* simplify test
* assert test raised
* print error msg
* check msg error
* check error
* woups
* clean
* simplify
* remove unused print
* add comment
* add test multiple of tp size
* add print
* add check
* clean
* Update megatron/mpu/layers.py
Co-authored-by: Thomas Wang <24695242+thomasw21@users.noreply.github.com>
* Update megatron/tokenizer/tokenizer.py
Co-authored-by: Thomas Wang <24695242+thomasw21@users.noreply.github.com>
* chnage micro-batch-size
* use tiny vocab
* fix data dir
* fix arg
* change micro-batch-size
* adept input ids
* assertIn
* change micro batch size
* Fix test TP
Co-authored-by: Thomas Wang <24695242+thomasw21@users.noreply.github.com>
* unused var
* add test make_vocab_size_divisible_by
* fix test_tokenizer_vocab_size_multiple_of_tp_size test
* Fix padded vocab size on preprocessing scripts (#257)
* Add tokenizer options in preprocessing scripts
* This should fix the TP issue?
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>
* documentation
Co-authored-by: Thomas Wang <24695242+thomasw21@users.noreply.github.com>