Generation: only search for eos_token if set (#22875)
Generation: only check for eos_token if set
The check for unfinished_sequences.max(), which is to find sequences
that have ended early via eos_token_id, creates a synchronization point
even when there is no eos_token, which slows inference down.
This change moves the calculation to inside the condition checking for
eos_token, so that such slowdown may be removed by disabling this token.
Co-authored-by: John Doe <john.doe@example.com>