Terminator strings for generate() #28932
gante
commented
on 2024-02-14
gante
approved these changes
on 2024-02-28
stash commit (will discard all of this)
262537f0
stash commit
cfa538b0
First commit - needs a lot of testing!
127182a9
Add a test
8cd60591
Fix imports and make the tests actually test something
5fde7aeb
Tests pass!
ff02b0cd
Rearrange test
4ce1aba0
Add comments (but it's still a bit confusing)
1742b681
Stop storing the tokenizer
9fb77e33
Comment fixup
667d6d88
Fix for input_ids with a single sequence
070a76e8
Update tests to test single sequences
4c436f2c
make fixup
78b0f247
Fix incorrect use of isin()
8ee5762e
Expand tests to catch more cases
9f43a2a6
Expand tests to catch more cases
f0fa7074
make fixup
5bcf5e47
Fix length calculation and update tests
8cca9a40
Handle Ä as a space replacement too
ec6f7265
Update src/transformers/generation/stopping_criteria.py
0e632c2f
Add optimizations from Joao's suggestion
ac1135c2
Remove TODO
27318270
Update src/transformers/generation/stopping_criteria.py
9213298f
Update tests/generation/test_stopping_criteria.py
f48522e1
make fixup
7a772b8f
Rename some variables and remove some debugging clauses for clarity
c604a2ba
Add tests for the sub-methods
7dd346af
Clarify one test slightly
641ba727
Add stop_strings to GenerationConfig
f6721a5a
generate() supports stop_string arg, asks for tokenizer if not provided
8772bcbb
make fixup
e423417a
Cleanup code and rename variables for clarity
398a799f
Update tokenizer error
e3140a68
Update tokenizer passing, handle generation on GPU
0008722d
Slightly more explanation cleanup
a29c131e
More comment cleanup
9c359ffe
Factor out the token cleanup so it's more obvious what we're doing, a…
602222dc
Careful with that cleanup!
4c7a7777
Cleanup + optimizations to _get_matching_positions
b6e01639
More minor performance tweaks
43d9e084
Implement caching and eliminate some expensive ops (startup time: 200…
60eb5769
Remove the pin_memory call
ff422118
Parallelize across all stop strings!
ae800a66
Quick fix for tensor devices
46c0a9c6
Update embeddings test for the new format
b9a066d3
Fix test imports
692523c4
Manual patching for BERT-like tokenizers
2ba7f8ed
Return a bool vector instead of a single True/False
8b95ec15
Better comment
1b46b208
Better comment
350a850e
Add tests from @zucchini-nlp
0b85c6c6
Amy's list creation nit
e8c769d2
tok_list -> token_list
14de1c3c
Push a big expanded docstring (should we put it somewhere else?)
b8961e8d
Expand docstrings
7ed55ad2
Docstring fixups
cbb9d147
Rebase
7db95c1a
make fixup
49b0f21e
Make a properly general method for figuring out token strings
c9aefe64
Fix naming throughout the functions
443cd5d6
Move cache, refactor, fix tests
e1c9c0e0
Add comment
f49ec00b
Remove finished TODO
e90aaba5
Remove finished TODO
bb27d82e
make fixup
43170197
Update src/transformers/generation/stopping_criteria.py
19df6a82
Update and shorten docstring
8b520391
Make a properly general method for figuring out token strings
c9aefe64
Move cache, refactor, fix tests
e1c9c0e0
Add comment
f49ec00b
Remove finished TODO
e90aaba5
Remove finished TODO
bb27d82e
Update tests to be shorter/clearer and test specific cases
0aa201cb
Rocketknight1
deleted the terminator_strings_for_generate branch 1 year ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub