PR #37 Add LRU cache, add faster tokenization

Add LRU cache, add faster tokenization #37

huu4ontocord merged 11 commits into bigscience-workshop:main from main

Update gpt2_tokenization.py

160e2bd5

Update gpt2_tokenization.py

e7c3d51c

thomasw21 commented on 2021-08-03

Update gpt2_tokenization.py

dc800086

Update preprocess_data.py

a405b9ea

Update gpt2_tokenization.py

54ab4e37

Merge branch 'bigscience-workshop:main' into main

e729aba9

thomasw21 approved these changes on 2021-08-04

Update megatron/tokenizer/gpt2_tokenization.py

35011493

huu4ontocord changed the title ~~Add LRU cache, add faster tokenization, and add optional Chinese tokenization.~~ Add LRU cache, add faster tokenization 4 years ago

Update gpt2_tokenization.py

75cce0bc

stas00 approved these changes on 2021-08-04

Update megatron/tokenizer/gpt2_tokenization.py

cc579250

Update gpt2_tokenization.py

18118923

Update gpt2_tokenization.py

02b2d2fb

huu4ontocord merged 36284576 into main 4 years ago

Reviewers

thomasw21

stas00

Assignees

No one assigned

Labels

None yet

Milestone

No milestone

Megatron-DeepSpeed Add LRU cache, add faster tokenization #37 Merged

Add LRU cache, add faster tokenization #37

Megatron-DeepSpeed
Add LRU cache, add faster tokenization
#37

Merged