openvino
[WIP] [DO NOT MERGE] VLLM integration with torch.compile OpenVINO backend
#36346

Open

[WIP] [DO NOT MERGE] VLLM integration with torch.compile OpenVINO backend #36346

ynimmaga wants to merge 54 commits into openvinotoolkit:master from ynimmaga:vllm_dev

[PT FE] Fixes for vLLM torch.compile OpenVINO backend

6cc529f7

[PT FE] Translate vLLM attention to PagedAttentionExtension

c3897a82

[PT FE] Fix vLLM PA side-channel KV binding to share memory with vLLM

9241d86a

[PT FE] Cache per-layer KV ov.Tensor wrappers across decode steps

6da1df7f

[PT FE] Share per-seq PA metadata Parameters across layers + derive s…

87284347

[PT FE] Derive past_lens / max_context_len from seq_lens + qsl in-graph

f8a862f3

[PT FE] Prefer static Reshape for q/k/v rank-2 flatten in PA translator

bdd01958

[PT FE] Set INFERENCE_PRECISION_HINT=bf16 for CPU to match genai

65c71183

[CPU] Normalize vLLM-style RoPE subgraphs so RoPEFusion matches

fd3d55b5

[PT FE] Fence PA op in f32 and share per-seq metadata Parameters

a365112b

[PT FE] Fix PA side-channel binding for multi-layer vLLM models

c6cbeef2

[PT FE] Route FC through oneDNN BRGEMM via decompression rt_info

deba18fa

[PT FE] Add env-gated thread/affinity knobs for vLLM CPU path

cabfe35f

[PT FE] Bind dummy PA KV cache with correct Hk/block/head_size

f2f78573

[PT FE] Auto-detect KV geometry and default to affinity-unbind

7eebeca3

[PT FE] Reduce Python overhead in PA side-channel bind

fef9616e

[CPU] Initialize PA _slot_mapping to -1 to avoid OOB on unassigned to…

604483e0

[PT FE] Compile OV graphs with dynamic shapes by default

a59f597e

torchdynamo: switch fx_openvino non-AOT path to real-tensor make_fx

2766a34a

vLLM CPU SPR: unblock OV path end-to-end with PA + AMX bf16

610ee24e

torchdynamo: vLLM general plugin to wire backend OOB

1ab700cb

torchdynamo: unwrap vLLM Parameter subclasses before make_fx

e58b6fa6

Revert "torchdynamo: unwrap vLLM Parameter subclasses before make_fx"

53dbd451

torchdynamo: drop experimental dtype-coercion paths in openvino_execute

b43d5cdf

Drop unneeded core/common_translators changes from PR

9bf1c70b

transformations: NormalizeVLLMMLP pass to canonicalize gate-up split

1b3066a7

intel_cpu: enable LLMMLP fusion for vLLM rank-2 activations

f0894e78

intel_cpu: enable QKVProjection fusion for vLLM rank-2 GQA graphs

6a338432

transformations: EraseRedundantConvertPair pass

804842b0

transformations: dtype-agnostic optional Convert in fusion patterns

d48ac65e

intel_cpu: cap LLMMLP parallel_nt_static to used_nthr at execute time

d4d4e237

transformations: identity-Convert removal + extra cleanup passes

f0fd5568

torchdynamo: re-enable onednn for lm_head outside the OV-traced graph

bc4699c2

torchdynamo: stop overriding caller affinity by default

3d0a7f3a

torchdynamo: route plugin options through torch.compile options dict

41264a10

torchdynamo: drop env fallbacks for option-routed flags

80cd3f7a

torchdynamo: detect OV-active backend from vllm_config, drop OV_VLLM_PA

5a819d38

pytorch frontend: drop OV_PA_DTYPE env, use input dtype

912832d1

torchdynamo: remove PA debug instrumentation

3c53bd2b

transformations: NormalizeVLLMQKV pass shrinks intel_cpu QKV patch

6228b9be

Revert "transformations: NormalizeVLLMQKV pass shrinks intel_cpu QKV …

fd023158

Revert "Revert "transformations: NormalizeVLLMQKV pass shrinks intel_…

4017f263

transformations: NormalizeVLLMQKV - walk through optional Convert + u…

dddcff2c

intel_cpu: drop QKV pattern axis-input flexibility (subsumed by Norma…

74f965c7

intel_cpu: revert LLMMLP parallel_nt_static used_nthr cap

c2003225

transformations: sink Convert past VariadicSplit in NormalizeVLLMQKV

7b3bb8cc

transformations: NormalizeVLLMMLP accepts Gelu activation (Gemma-3 su…

db83a034

torchdynamo: OV-fused sampler fast path for vLLM v1

2af320b9

torchdynamo: move vLLM glue into a dedicated vllm/ subpackage

0aa932c7

torchdynamo: extract vLLM-specific helpers into vllm/ subpackage

f28a4bbe

torchdynamo: extract more vLLM hooks into vllm/ subpackage

c64f88ca

torchdynamo: extract vLLM KV-cache config defaults into vllm/

9b9bfd69

torchdynamo: extract MatMul fp16/bf16 decompression rewrite to vllm/

1059c0e7

torchdynamo: move _config_with_vllm_defaults into vllm/preset

3029753d

github-actions added category: CPU

github-actions added category: Python API

github-actions added category: transformations

github-actions added category: dependency_changes

github-actions added category: CPP API

github-actions added category: PyTorch FE

Reviewers

No reviews

Assignees

No one assigned

Labels

category: CPU category: Python API category: transformations category: dependency_changes category: CPP API category: PyTorch FE

Milestone

No milestone

openvino [WIP] [DO NOT MERGE] VLLM integration with torch.compile OpenVINO backend #36346 Open

[WIP] [DO NOT MERGE] VLLM integration with torch.compile OpenVINO backend #36346

openvino
[WIP] [DO NOT MERGE] VLLM integration with torch.compile OpenVINO backend
#36346

Open