transformers
4e441e52 - fix LayoutLMv3TokenizerFast subword label after 'Ġ' token (#21695)

Commit
2 years ago
fix LayoutLMv3TokenizerFast subword label after 'Ġ' token (#21695) LayoutLMv3TokenizerFast produces empty 'Ġ' token with `offset_mapping = (0, 0)`. Next token is wrongly assumed to also be beginning of word and isn't correctly assigned `pad_token_label`. Modify test with text that produce 'Ġ' token. Remove copy check from LayoutLMv2TokenizerFast for `_batch_encode_plus`. solves issue: #19978
Parents
Loading