transformers
8ea9438f - Move output_ids and EOS masking to CPU in _static_sample

Commit
1 day ago
Move output_ids and EOS masking to CPU in _static_sample Move the output buffer (output_ids) to CPU and perform all token bookkeeping on CPU to avoid device-side ops that would trigger NEFF recompilation on Neuron. Changes: - output_ids allocated on CPU, prompt copied via input_ids.cpu() - current_ids (device-side input buffer) updated via .copy_() from CPU - EOS masking done entirely on CPU (no unfinished_sequences.to(device)) - logits_processor receives full output_ids buffer (static shape) - output_ids moved back to device before return On CUDA (A10G, Llama-3.1-8B, 256 tokens): +1.4% vs _sample baseline. Sanity check: PASSED (identical greedy output). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Author
Committer
Parents
Loading