Megatron-DeepSpeed
Add LRU cache, add faster tokenization
#37
Merged

Add LRU cache, add faster tokenization #37

huu4ontocord merged 11 commits into bigscience-workshop:main from main
huu4ontocord
huu4ontocord Update gpt2_tokenization.py
160e2bd5
huu4ontocord Update gpt2_tokenization.py
e7c3d51c
thomasw21
thomasw21 commented on 2021-08-03
huu4ontocord
thomasw21
sbmaruf
stas00
huu4ontocord Update gpt2_tokenization.py
dc800086
thomasw21
huu4ontocord
huu4ontocord Update preprocess_data.py
a405b9ea
huu4ontocord Update gpt2_tokenization.py
54ab4e37
huu4ontocord
huu4ontocord
huu4ontocord Merge branch 'bigscience-workshop:main' into main
e729aba9
thomasw21
thomasw21 approved these changes on 2021-08-04
huu4ontocord Update megatron/tokenizer/gpt2_tokenization.py
35011493
huu4ontocord huu4ontocord changed the title Add LRU cache, add faster tokenization, and add optional Chinese tokenization. Add LRU cache, add faster tokenization 4 years ago
huu4ontocord Update gpt2_tokenization.py
75cce0bc
stas00
stas00 approved these changes on 2021-08-04
huu4ontocord Update megatron/tokenizer/gpt2_tokenization.py
cc579250
huu4ontocord Update gpt2_tokenization.py
18118923
stas00
huu4ontocord Update gpt2_tokenization.py
02b2d2fb
huu4ontocord
huu4ontocord huu4ontocord merged 36284576 into main 4 years ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone