transformers
042f4203 - Update pipeline word heuristic to work with whitespace in token offsets (#18402)

Commit
3 years ago
Update pipeline word heuristic to work with whitespace in token offsets (#18402) * Update pipeline word heuristic to work with whitespace in token offsets This change checks for whitespace in the input string at either the character preceding the token or in the first character of the token. This works with tokenizers that return offsets excluding whitespace between words or with offsets including whitespace. fixes #18111 starting * Use smaller model, ensure expected tokenization * Re-run CI (please squash)
Author
Parents
Loading