transformers
4563ba2c - Fix StopStringCriteria to handle tokens above len(tokenizer) (#35797)

Commit

1 year ago

Fix StopStringCriteria to handle tokens above len(tokenizer) (#35797) * Fix StopStringCriteria to handle tokens above len(tokenizer) This fixes #35244 by clipping token IDs to be within the tokenizer's vocabulary size before performing the embedding lookup. This prevents index errors when model.config.vocab_size > len(tokenizer). The fix: 1. Adds a clamp operation to ensure token IDs are within bounds 2. Adds a test case to verify the behavior * Use self.stop_strings instead of stop_strings * Handle clipping correctly * make fixup * Update test to the new embedding vecs * Use much bigger values in the mismatch test * Typo fix * Slight simplification --------- Co-authored-by: openhands <openhands@all-hands.dev>

References

#35797 - Fix StopStringCriteria to handle tokens above len(tokenizer)

Author

Rocketknight1

Parents

28f73bc3

transformers 4563ba2c - Fix StopStringCriteria to handle tokens above len(tokenizer) (#35797)

transformers
4563ba2c - Fix StopStringCriteria to handle tokens above len(tokenizer) (#35797)