DeepSpeed
a897889e - Add continuous batching with shared prefix, KV trim, and early exit to OPSD rollout engine

Commit
27 days ago
Add continuous batching with shared prefix, KV trim, and early exit to OPSD rollout engine Port the three generation optimizations from GRPO trainer into HybridEngineRollout: 1. Shared prefix: prefill prompt KV once, expand for n samples 2. Continuous batching: fixed slot count, replace finished slots with next pending rollout via left-padded KV injection 3. KV cache left-trim: trim common leading padding (threshold=16) 4. Early exit / batch compaction: shrink batch via reorder_cache when no pending rollouts remain Activated via RolloutConfig.continuous_batching_size > 0. When 0 (default), falls back to HF generate (unchanged behavior). Validated on 2xA100-40GB with Qwen2.5-0.5B student + 1.5B teacher, n_samples_per_prompt=32, cb_size=8, 5 steps with finite loss. Signed-off-by: Guoyang Ma <gma@dgx-a100-a>
Author
Parents
Loading