transformers
ee8603b6 - Replace .item() device-host sync with loop index in _static_sample

Commit

21 days ago

Replace .item() device-host sync with loop index in _static_sample next_pos.item() forces a device-to-host synchronization every decode step to convert a device tensor to a Python int. Use the loop index to derive cur_len = prefill_len + i + 2 instead, which is a pure Python operation. Critical on Neuron (~40ms per sync), minor on CUDA. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Author

dacorvo

Committer

dacorvo

Parents

2b2bcb01

transformers ee8603b6 - Replace .item() device-host sync with loop index in _static_sample

transformers
ee8603b6 - Replace .item() device-host sync with loop index in _static_sample