vllm
[Core] Add Helix (Context + Tensor) Parallelism
#34024

Open

[Core] Add Helix (Context + Tensor) Parallelism #34024

sungsooha wants to merge 39 commits into vllm-project:main from sungsooha:helix-parallelism

sungsooha requested a review from

mgoin 1 day ago

sungsooha requested a review from

pavanimajety 1 day ago

sungsooha requested a review from

LucasWilkinson 1 day ago

sungsooha requested a review from

WoosukKwon 1 day ago

sungsooha requested a review from

youkaichao 1 day ago

sungsooha requested a review from

robertgshaw2-redhat 1 day ago

sungsooha requested a review from

tlrmchlsmth 1 day ago

sungsooha requested a review from

houseroad 1 day ago

sungsooha requested a review from

hmellor 1 day ago

sungsooha requested a review from

yewentao256 1 day ago

sungsooha requested a review from

ProExpertProg 1 day ago

mergify added documentation

mergify added llama

mergify added nvidia

mergify added v1

gemini-code-assist commented on 2026-02-06

[Helix] Add Helix parallelism for decode context parallel

66d4e6c5

[Helix] Add full GQA support and FlashInfer/MLA integration

9ce51f3b

[Helix] Add GQA model support with proper head distribution

c1b08fe5

[Helix] Add helix_mode and DCP to engine init log

13b71d9f

[Helix] Add unit and integration tests

791ae8e6

[Helix] Add functional tests and documentation

2f62b4d3

[Helix] Add functional tests and documentation

c261a6fa

[Helix] Fix CUDA fork issue in functional tests

a82e5192

fix(tests): simplify GPU check to match vLLM test patterns

f4cd9454

fix(tests): use multi_gpu_test decorator for proper GPU detection

334d512f

[Bugfix] Restore DCP->PIECEWISE CUDA graph check

3aaaa044

[Helix] Fix FlashInfer GQA mode head count configuration

b9f9479b

[Helix] Skip FlashInferMLA when DCP enabled (no LSE support)

7891e1af

[Helix] Fix FlashInfer num_qo_heads computation for GQA mode

0e96ac3f

fix: use vllm_config.parallel_config instead of self.parallel_config

9e630662

fix: access total_num_attention_heads via model_arch_config

cb475429

fix: compute FlashInfer head counts at build() time for Helix GQA

2cc76460

fix: add missing .contiguous() in FlashInfer Helix GQA decode path

8aa16cdf

fix: pass is_lse_base_on_e=False for FlashInfer Helix paths

0977bf0f

fix: explicitly set CUTLASS_MLA backend when DCP is enabled

002db495

fix: move lse_query transpose AFTER head scatter in Helix GQA prefill

1a6639c4

revert: match internal repo FlashInfer Helix implementation exactly

70aede06

fix(flashinfer): use built-in fast_decode_plan instead of custom impl

993ef12d

Revert "fix(flashinfer): use built-in fast_decode_plan instead of cus…

ff44c9f8

feat: add validation to prevent FlashInfer + Helix GQA combination

e3a77f1c

refactor: remove Helix GQA code from FlashInfer backend

6d0f71e7

docs: update Helix documentation and tests for backend compatibility

402953b1

fix: use _qkv_tp_rank in legacy weight_loader for Helix GQA

95a961ca

fix: remove duplicate q_pad_num_heads in CutlassMLAImpl

9a3ab6d8

fix: guard get_current_vllm_config() during torch.compile tracing

9fc96b67

fix: allow FULL CUDA graphs for MLA models with DCP

c793510d

fix: apply Helix All-to-All for MLA decode in forward_impl

1a9ff9e2

perf(helix): add buffer reuse to reduce allocation overhead

a50c1d6c

perf(helix): add packed single-A2A optimization

afbc7a1e

revert: remove A2A optimizations (no measurable benefit)

973781e2

[Cleanup] Remove dead MLACommonImpl.forward() method

46e1fa56

sungsooha force pushed from de681c09 to 46e1fa56 1 day ago

fix(tests): add CPU reference implementation for LSE combine

bb0fdd31

style: fix pre-commit lint issues

51a281d4

fix: pre-commit markdownlint and mypy errors

15daff17

Reviewers

gemini-code-assist

mgoin

pavanimajety

LucasWilkinson

WoosukKwon

youkaichao

robertgshaw2-redhat

tlrmchlsmth

houseroad

hmellor

yewentao256

ProExpertProg

Assignees

No one assigned

Labels

documentation v1 llama nvidia

Milestone

No milestone

vllm [Core] Add Helix (Context + Tensor) Parallelism #34024 Open

[Core] Add Helix (Context + Tensor) Parallelism #34024

vllm
[Core] Add Helix (Context + Tensor) Parallelism
#34024

Open