transformers
b8805084 - tokenization_marian.py: use current_spm for decoding (#10357)

Commit

4 years ago

tokenization_marian.py: use current_spm for decoding (#10357) * Fix Marian decoding Tokenizer's decode and batch_decode now accepts a new argument (use_source_tokenizer) which indicates whether the source spm should be used to decode ids. This is useful for Marian models specificallly when decoding source input ids. * Adapt docstrings Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>

References

#10357 - tokenization_marian.py: use current_spm for decoding

Author

Mehrad0711

Parents

8fd7eb34

transformers b8805084 - tokenization_marian.py: use current_spm for decoding (#10357)

transformers
b8805084 - tokenization_marian.py: use current_spm for decoding (#10357)