transformers
20cb3c8e - Move EOS tracking to CPU in _static_sample to avoid blocking device sync

Commit

49 days ago

Move EOS tracking to CPU in _static_sample to avoid blocking device sync Move unfinished_sequences to CPU and replace stopping_criteria EOS check with torch.isin(next_tokens.cpu(), eos_token_id_cpu). This replaces the blocking max() device reduction with an async D2H copy + CPU-only bookkeeping. No measurable CUDA improvement (+0% on A10G with Llama-3.1-8B), but eliminates a pipeline stall relevant on Neuron/XLA (~40ms per sync). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Author

dacorvo

Committer

dacorvo

Parents

55fc5dc9

transformers 20cb3c8e - Move EOS tracking to CPU in _static_sample to avoid blocking device sync

transformers
20cb3c8e - Move EOS tracking to CPU in _static_sample to avoid blocking device sync