add `pad-vocab-size-to` argument and tests #255
init new test
2025ac24
test pad vocab size to
c33343d2
add logs
390b4dc1
log to warning
784b7512
change TP
6f3a4721
fix loop
1d9649af
revert
7fa5c103
remove hack size
bcc6d8de
this new test should pass
9e17a4f7
test not divisible by num tp
92614bf7
Revert "remove hack size"
8322f897
Revert "Revert "remove hack size""
6d720734
Revert "test not divisible by num tp"
84333d35
Revert "this new test should pass"
b2382d84
change info to warning
d4a15a3b
change to print
cd5e8b4e
add print
a6ee8947
test 2
f534c43f
new print
0a1167b0
woups
34bfd60d
more
50cb3ca5
woups
786e02dc
comment
20d08a85
raise errors
915bd6c7
woups
119a0d2d
pad to save vocab size
5c6dec09
simplify test
de3353fb
assert test raised
8485770a
print error msg
df244924
check msg error
46fc9dac
check error
9ffafb12
woups
1eb5baa4
clean
56af695d
simplify
3ea0c6bc
remove unused print
be2e371b
add comment
89869625
add test multiple of tp size
a72fa034
add print
1e5b2af3
add check
8d8be7ea
SaulLu
changed the title [WIP] set the tokenizer vocab size add `pad-vocab-size-to` argument and tests 3 years ago
SaulLu
commented
on 2022-02-28
clean
b2867a7b
stas00
commented
on 2022-02-28
stas00
commented
on 2022-02-28
Update megatron/mpu/layers.py
ef61e898
Update megatron/tokenizer/tokenizer.py
c10a3598
chnage micro-batch-size
fc975b44
use tiny vocab
a2b86b74
fix data dir
ae9f83c1
fix arg
ecdda509
change micro-batch-size
c170fd9d
adept input ids
c82d6154
assertIn
3587b52c
change micro batch size
a90a8f99
Fix test TP
982d88c2
unused var
78b76861
add test make_vocab_size_divisible_by
c9222042
fix test_tokenizer_vocab_size_multiple_of_tp_size test
806cbb5f
Fix padded vocab size on preprocessing scripts (#257)
f515b67f
documentation
02f86f57
thomasw21
approved these changes
on 2022-03-01
SaulLu
merged
58d92042
into main 3 years ago
SaulLu
deleted the LS/vocab_size branch 3 years ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub