transformers
ee8603b6 - Replace .item() device-host sync with loop index in _static_sample

Commit
21 days ago
Replace .item() device-host sync with loop index in _static_sample next_pos.item() forces a device-to-host synchronization every decode step to convert a device tensor to a Python int. Use the loop index to derive cur_len = prefill_len + i + 2 instead, which is a pure Python operation. Critical on Neuron (~40ms per sync), minor on CUDA. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Author
Committer
Parents
Loading