transformers
b1527a32 - fix: improve processor loading performance by avoiding redundant tokenizer parsing (#44927)

Commit
3 days ago
fix: improve processor loading performance by avoiding redundant tokenizer parsing (#44927) * fix(tokenization_utils_tokenizers): avoid parsing full vocab in from_file when only post_processor/padding/truncation are needed * fix(tokenization_utils_tokenizers): fall back to from_file when model type is missing in tokenizer.json * fix(tokenization_utils_tokenizers): restrict minimal tokenizer optimization to BPE/WordPiece/WordLevel only * fix(tokenization_utils_tokenizers): add comment explaining why Unigram and older formats fall back to from_file * apply suggestions * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Author
Parents
Loading