vllm
d0e6514b - [Attention] Add TOKENSPEED_MLA backend for DeepSeek R1 prefill + decode on Blackwell

Commit
11 days ago
[Attention] Add TOKENSPEED_MLA backend for DeepSeek R1 prefill + decode on Blackwell Wires the tokenspeed_mla CuTe DSL kernels into vLLM as a new MLA backend, covering both prefill (tokenspeed_mla_prefill) and decode (tokenspeed_mla_decode). Targets Blackwell (SM100) with FP8 KV cache and DeepSeek R1 MLA dimensions; users opt in via -ac '{"backend":"TOKENSPEED_MLA","mla_prefill_backend":"TOKENSPEED_MLA"}'. Includes numeric parity tests against the trtllm reference kernels. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Author
Committer
Parents
Loading