DeepSpeed
f9a73e06 - rollout: add FlashInfer kernel manager and vLLM compat shim

Commit
17 days ago
rollout: add FlashInfer kernel manager and vLLM compat shim - Add FlashInferKernelManager context manager for swapping attention and sampling kernels during decode (framework ready, activation pending Blackwell FlashInfer support) - Add _vllm_compat/ sitecustomize to handle duplicate template name errors from vLLM 0.22+ Pydantic validation - Add bench_flashinfer.py benchmark script with --graph-capture flag - Refactor HybridEngineRollout.generate() to extract _dispatch_generate() for cleaner flashinfer integration path Signed-off-by: Guokai Ma <guokai.ma@intel.com>
Author
Parents
Loading