openvino
[WIP] [DO NOT MERGE] VLLM integration with torch.compile OpenVINO backend
#36346
Open

[WIP] [DO NOT MERGE] VLLM integration with torch.compile OpenVINO backend #36346

ynimmaga wants to merge 54 commits into openvinotoolkit:master from ynimmaga:vllm_dev
ynimmaga
ynimmaga [PT FE] Fixes for vLLM torch.compile OpenVINO backend
6cc529f7
ynimmaga [PT FE] Translate vLLM attention to PagedAttentionExtension
c3897a82
ynimmaga [PT FE] Fix vLLM PA side-channel KV binding to share memory with vLLM
9241d86a
ynimmaga [PT FE] Cache per-layer KV ov.Tensor wrappers across decode steps
6da1df7f
ynimmaga [PT FE] Share per-seq PA metadata Parameters across layers + derive s…
87284347
ynimmaga [PT FE] Derive past_lens / max_context_len from seq_lens + qsl in-graph
f8a862f3
ynimmaga [PT FE] Prefer static Reshape for q/k/v rank-2 flatten in PA translator
bdd01958
ynimmaga [PT FE] Set INFERENCE_PRECISION_HINT=bf16 for CPU to match genai
65c71183
ynimmaga [CPU] Normalize vLLM-style RoPE subgraphs so RoPEFusion matches
fd3d55b5
ynimmaga [PT FE] Fence PA op in f32 and share per-seq metadata Parameters
a365112b
ynimmaga [PT FE] Fix PA side-channel binding for multi-layer vLLM models
c6cbeef2
ynimmaga [PT FE] Route FC through oneDNN BRGEMM via decompression rt_info
deba18fa
ynimmaga [PT FE] Add env-gated thread/affinity knobs for vLLM CPU path
cabfe35f
ynimmaga [PT FE] Bind dummy PA KV cache with correct Hk/block/head_size
f2f78573
ynimmaga [PT FE] Auto-detect KV geometry and default to affinity-unbind
7eebeca3
ynimmaga [PT FE] Reduce Python overhead in PA side-channel bind
fef9616e
ynimmaga [CPU] Initialize PA _slot_mapping to -1 to avoid OOB on unassigned to…
604483e0
ynimmaga [PT FE] Compile OV graphs with dynamic shapes by default
a59f597e
ynimmaga torchdynamo: switch fx_openvino non-AOT path to real-tensor make_fx
2766a34a
claude vLLM CPU SPR: unblock OV path end-to-end with PA + AMX bf16
610ee24e
ynimmaga torchdynamo: vLLM general plugin to wire backend OOB
1ab700cb
ynimmaga torchdynamo: unwrap vLLM Parameter subclasses before make_fx
e58b6fa6
ynimmaga Revert "torchdynamo: unwrap vLLM Parameter subclasses before make_fx"
53dbd451
ynimmaga torchdynamo: drop experimental dtype-coercion paths in openvino_execute
b43d5cdf
ynimmaga Drop unneeded core/common_translators changes from PR
9bf1c70b
ynimmaga transformations: NormalizeVLLMMLP pass to canonicalize gate-up split
1b3066a7
ynimmaga intel_cpu: enable LLMMLP fusion for vLLM rank-2 activations
f0894e78
ynimmaga intel_cpu: enable QKVProjection fusion for vLLM rank-2 GQA graphs
6a338432
ynimmaga transformations: EraseRedundantConvertPair pass
804842b0
ynimmaga transformations: dtype-agnostic optional Convert in fusion patterns
d48ac65e
ynimmaga intel_cpu: cap LLMMLP parallel_nt_static to used_nthr at execute time
d4d4e237
ynimmaga transformations: identity-Convert removal + extra cleanup passes
f0fd5568
ynimmaga torchdynamo: re-enable onednn for lm_head outside the OV-traced graph
bc4699c2
ynimmaga torchdynamo: stop overriding caller affinity by default
3d0a7f3a
ynimmaga torchdynamo: route plugin options through torch.compile options dict
41264a10
ynimmaga torchdynamo: drop env fallbacks for option-routed flags
80cd3f7a
ynimmaga torchdynamo: detect OV-active backend from vllm_config, drop OV_VLLM_PA
5a819d38
ynimmaga pytorch frontend: drop OV_PA_DTYPE env, use input dtype
912832d1
ynimmaga torchdynamo: remove PA debug instrumentation
3c53bd2b
ynimmaga transformations: NormalizeVLLMQKV pass shrinks intel_cpu QKV patch
6228b9be
ynimmaga Revert "transformations: NormalizeVLLMQKV pass shrinks intel_cpu QKV …
fd023158
ynimmaga Revert "Revert "transformations: NormalizeVLLMQKV pass shrinks intel_…
4017f263
ynimmaga transformations: NormalizeVLLMQKV - walk through optional Convert + u…
dddcff2c
ynimmaga intel_cpu: drop QKV pattern axis-input flexibility (subsumed by Norma…
74f965c7
ynimmaga intel_cpu: revert LLMMLP parallel_nt_static used_nthr cap
c2003225
ynimmaga transformations: sink Convert past VariadicSplit in NormalizeVLLMQKV
7b3bb8cc
ynimmaga transformations: NormalizeVLLMMLP accepts Gelu activation (Gemma-3 su…
db83a034
ynimmaga torchdynamo: OV-fused sampler fast path for vLLM v1
2af320b9
ynimmaga torchdynamo: move vLLM glue into a dedicated vllm/ subpackage
0aa932c7
ynimmaga torchdynamo: extract vLLM-specific helpers into vllm/ subpackage
f28a4bbe
ynimmaga torchdynamo: extract more vLLM hooks into vllm/ subpackage
c64f88ca
ynimmaga torchdynamo: extract vLLM KV-cache config defaults into vllm/
9b9bfd69
ynimmaga torchdynamo: extract MatMul fp16/bf16 decompression rewrite to vllm/
1059c0e7
ynimmaga torchdynamo: move _config_with_vllm_defaults into vllm/preset
3029753d
github-actions github-actions added category: CPU
github-actions github-actions added category: Python API
github-actions github-actions added category: transformations
github-actions github-actions added category: dependency_changes
github-actions github-actions added category: CPP API
github-actions github-actions added category: PyTorch FE

Login to write a write a comment.

Login via GitHub

Reviewers
No reviews
Assignees
No one assigned
Labels
Milestone