Add OPSD trainer, hybrid-engine rollout, and end-to-end entry point
Lands the fully-runnable hybrid-engine training path: a backend-agnostic
RolloutEngine ABC with RolloutRequest / RolloutBatch / SamplingConfig
dataclasses, a HybridEngineRollout implementation that uses DeepSpeed's
accelerated decode when an inference policy exists and otherwise falls
back to GatheredParameters + the raw HF generate (covers Qwen-family and
other models not in DeepSpeed's inference container list), a left-padded
prompt dataset + collator, a three-phase trainer loop (rollout -> teacher
forward + cache -> student forward + streamed KL + backward + step), the
argparse + deepspeed.initialize entry point, base DeepSpeed ZeRO-3 +
hybrid_engine JSON configs, a 5-step smoke config and launcher script,
and a 20-prompt math toy dataset for the smoke run.
Smoke-validated end-to-end on 2x H200 with Qwen2.5-0.5B-Instruct student
and Qwen2.5-1.5B-Instruct teacher; loss finite for 5 steps. Rollout
interface contract is covered by tests/test_rollout_interface.py.
Signed-off-by: Zhipeng Wang <zhipengbayern@gmail.com>