vllm
[Core] Add Helix (Context + Tensor) Parallelism
#34024
Open

[Core] Add Helix (Context + Tensor) Parallelism #34024

sungsooha wants to merge 39 commits into vllm-project:main from sungsooha:helix-parallelism
sungsooha
sungsooha sungsooha requested a review from mgoin mgoin 1 day ago
sungsooha sungsooha requested a review from pavanimajety pavanimajety 1 day ago
sungsooha sungsooha requested a review from LucasWilkinson LucasWilkinson 1 day ago
sungsooha sungsooha requested a review from WoosukKwon WoosukKwon 1 day ago
sungsooha sungsooha requested a review from youkaichao youkaichao 1 day ago
sungsooha sungsooha requested a review from robertgshaw2-redhat robertgshaw2-redhat 1 day ago
sungsooha sungsooha requested a review from tlrmchlsmth tlrmchlsmth 1 day ago
sungsooha sungsooha requested a review from houseroad houseroad 1 day ago
sungsooha sungsooha requested a review from hmellor hmellor 1 day ago
sungsooha sungsooha requested a review from yewentao256 yewentao256 1 day ago
sungsooha sungsooha requested a review from ProExpertProg ProExpertProg 1 day ago
github-actions
mergify
mergify mergify added documentation
mergify mergify added llama
mergify mergify added nvidia
mergify mergify added v1
gemini-code-assist
gemini-code-assist commented on 2026-02-06
mergify
sungsooha [Helix] Add Helix parallelism for decode context parallel
66d4e6c5
sungsooha [Helix] Add full GQA support and FlashInfer/MLA integration
9ce51f3b
sungsooha [Helix] Add GQA model support with proper head distribution
c1b08fe5
sungsooha [Helix] Add helix_mode and DCP to engine init log
13b71d9f
sungsooha [Helix] Add unit and integration tests
791ae8e6
sungsooha [Helix] Add functional tests and documentation
2f62b4d3
sungsooha [Helix] Add functional tests and documentation
c261a6fa
sungsooha [Helix] Fix CUDA fork issue in functional tests
a82e5192
sungsooha fix(tests): simplify GPU check to match vLLM test patterns
f4cd9454
sungsooha fix(tests): use multi_gpu_test decorator for proper GPU detection
334d512f
sungsooha [Bugfix] Restore DCP->PIECEWISE CUDA graph check
3aaaa044
sungsooha [Helix] Fix FlashInfer GQA mode head count configuration
b9f9479b
sungsooha [Helix] Skip FlashInferMLA when DCP enabled (no LSE support)
7891e1af
sungsooha [Helix] Fix FlashInfer num_qo_heads computation for GQA mode
0e96ac3f
sungsooha fix: use vllm_config.parallel_config instead of self.parallel_config
9e630662
sungsooha fix: access total_num_attention_heads via model_arch_config
cb475429
sungsooha fix: compute FlashInfer head counts at build() time for Helix GQA
2cc76460
sungsooha fix: add missing .contiguous() in FlashInfer Helix GQA decode path
8aa16cdf
sungsooha fix: pass is_lse_base_on_e=False for FlashInfer Helix paths
0977bf0f
sungsooha fix: explicitly set CUTLASS_MLA backend when DCP is enabled
002db495
sungsooha fix: move lse_query transpose AFTER head scatter in Helix GQA prefill
1a6639c4
sungsooha revert: match internal repo FlashInfer Helix implementation exactly
70aede06
sungsooha fix(flashinfer): use built-in fast_decode_plan instead of custom impl
993ef12d
sungsooha Revert "fix(flashinfer): use built-in fast_decode_plan instead of cus…
ff44c9f8
sungsooha feat: add validation to prevent FlashInfer + Helix GQA combination
e3a77f1c
sungsooha refactor: remove Helix GQA code from FlashInfer backend
6d0f71e7
sungsooha docs: update Helix documentation and tests for backend compatibility
402953b1
sungsooha fix: use _qkv_tp_rank in legacy weight_loader for Helix GQA
95a961ca
sungsooha fix: remove duplicate q_pad_num_heads in CutlassMLAImpl
9a3ab6d8
sungsooha fix: guard get_current_vllm_config() during torch.compile tracing
9fc96b67
sungsooha fix: allow FULL CUDA graphs for MLA models with DCP
c793510d
sungsooha fix: apply Helix All-to-All for MLA decode in forward_impl
1a9ff9e2
sungsooha perf(helix): add buffer reuse to reduce allocation overhead
a50c1d6c
sungsooha perf(helix): add packed single-A2A optimization
afbc7a1e
sungsooha revert: remove A2A optimizations (no measurable benefit)
973781e2
sungsooha [Cleanup] Remove dead MLACommonImpl.forward() method
46e1fa56
sungsooha sungsooha force pushed from de681c09 to 46e1fa56 1 day ago
mergify
sungsooha fix(tests): add CPU reference implementation for LSE combine
bb0fdd31
mergify
sungsooha style: fix pre-commit lint issues
51a281d4
mergify
sungsooha fix: pre-commit markdownlint and mypy errors
15daff17

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone