fix: improve processor loading performance by avoiding redundant tokenizer parsing (#44927)
* fix(tokenization_utils_tokenizers): avoid parsing full vocab in from_file when only post_processor/padding/truncation are needed
* fix(tokenization_utils_tokenizers): fall back to from_file when model type is missing in tokenizer.json
* fix(tokenization_utils_tokenizers): restrict minimal tokenizer optimization to BPE/WordPiece/WordLevel only
* fix(tokenization_utils_tokenizers): add comment explaining why Unigram and older formats fall back to from_file
* apply suggestions
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>