Fix left-padding position_ids in _static_sample for batched generation
Without this, _static_sample sets position_ids = cache_position for all
batch elements, which is only correct when there is no left-padding.
With left-padded batches (required for decoder-only batched generation),
different batch elements have different padding amounts.
Compute position_offset = prefill_len - attention_mask.sum() once before
the loop, then position_ids = cache_position - position_offset each step.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>