openvino
[WIP] [DO NOT MERGE] VLLM integration with torch.compile OpenVINO backend
#36346
Open
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
54
Changes
View On
GitHub
[WIP] [DO NOT MERGE] VLLM integration with torch.compile OpenVINO backend
#36346
ynimmaga
wants to merge 54 commits into
openvinotoolkit:master
from
ynimmaga:vllm_dev
[PT FE] Fixes for vLLM torch.compile OpenVINO backend
6cc529f7
[PT FE] Translate vLLM attention to PagedAttentionExtension
c3897a82
[PT FE] Fix vLLM PA side-channel KV binding to share memory with vLLM
9241d86a
[PT FE] Cache per-layer KV ov.Tensor wrappers across decode steps
6da1df7f
[PT FE] Share per-seq PA metadata Parameters across layers + derive s…
87284347
[PT FE] Derive past_lens / max_context_len from seq_lens + qsl in-graph
f8a862f3
[PT FE] Prefer static Reshape for q/k/v rank-2 flatten in PA translator
bdd01958
[PT FE] Set INFERENCE_PRECISION_HINT=bf16 for CPU to match genai
65c71183
[CPU] Normalize vLLM-style RoPE subgraphs so RoPEFusion matches
fd3d55b5
[PT FE] Fence PA op in f32 and share per-seq metadata Parameters
a365112b
[PT FE] Fix PA side-channel binding for multi-layer vLLM models
c6cbeef2
[PT FE] Route FC through oneDNN BRGEMM via decompression rt_info
deba18fa
[PT FE] Add env-gated thread/affinity knobs for vLLM CPU path
cabfe35f
[PT FE] Bind dummy PA KV cache with correct Hk/block/head_size
f2f78573
[PT FE] Auto-detect KV geometry and default to affinity-unbind
7eebeca3
[PT FE] Reduce Python overhead in PA side-channel bind
fef9616e
[CPU] Initialize PA _slot_mapping to -1 to avoid OOB on unassigned to…
604483e0
[PT FE] Compile OV graphs with dynamic shapes by default
a59f597e
torchdynamo: switch fx_openvino non-AOT path to real-tensor make_fx
2766a34a
vLLM CPU SPR: unblock OV path end-to-end with PA + AMX bf16
610ee24e
torchdynamo: vLLM general plugin to wire backend OOB
1ab700cb
torchdynamo: unwrap vLLM Parameter subclasses before make_fx
e58b6fa6
Revert "torchdynamo: unwrap vLLM Parameter subclasses before make_fx"
53dbd451
torchdynamo: drop experimental dtype-coercion paths in openvino_execute
b43d5cdf
Drop unneeded core/common_translators changes from PR
9bf1c70b
transformations: NormalizeVLLMMLP pass to canonicalize gate-up split
1b3066a7
intel_cpu: enable LLMMLP fusion for vLLM rank-2 activations
f0894e78
intel_cpu: enable QKVProjection fusion for vLLM rank-2 GQA graphs
6a338432
transformations: EraseRedundantConvertPair pass
804842b0
transformations: dtype-agnostic optional Convert in fusion patterns
d48ac65e
intel_cpu: cap LLMMLP parallel_nt_static to used_nthr at execute time
d4d4e237
transformations: identity-Convert removal + extra cleanup passes
f0fd5568
torchdynamo: re-enable onednn for lm_head outside the OV-traced graph
bc4699c2
torchdynamo: stop overriding caller affinity by default
3d0a7f3a
torchdynamo: route plugin options through torch.compile options dict
41264a10
torchdynamo: drop env fallbacks for option-routed flags
80cd3f7a
torchdynamo: detect OV-active backend from vllm_config, drop OV_VLLM_PA
5a819d38
pytorch frontend: drop OV_PA_DTYPE env, use input dtype
912832d1
torchdynamo: remove PA debug instrumentation
3c53bd2b
transformations: NormalizeVLLMQKV pass shrinks intel_cpu QKV patch
6228b9be
Revert "transformations: NormalizeVLLMQKV pass shrinks intel_cpu QKV …
fd023158
Revert "Revert "transformations: NormalizeVLLMQKV pass shrinks intel_…
4017f263
transformations: NormalizeVLLMQKV - walk through optional Convert + u…
dddcff2c
intel_cpu: drop QKV pattern axis-input flexibility (subsumed by Norma…
74f965c7
intel_cpu: revert LLMMLP parallel_nt_static used_nthr cap
c2003225
transformations: sink Convert past VariadicSplit in NormalizeVLLMQKV
7b3bb8cc
transformations: NormalizeVLLMMLP accepts Gelu activation (Gemma-3 su…
db83a034
torchdynamo: OV-fused sampler fast path for vLLM v1
2af320b9
torchdynamo: move vLLM glue into a dedicated vllm/ subpackage
0aa932c7
torchdynamo: extract vLLM-specific helpers into vllm/ subpackage
f28a4bbe
torchdynamo: extract more vLLM hooks into vllm/ subpackage
c64f88ca
torchdynamo: extract vLLM KV-cache config defaults into vllm/
9b9bfd69
torchdynamo: extract MatMul fp16/bf16 decompression rewrite to vllm/
1059c0e7
torchdynamo: move _config_with_vllm_defaults into vllm/preset
3029753d
github-actions
added
category: CPU
github-actions
added
category: Python API
github-actions
added
category: transformations
github-actions
added
category: dependency_changes
github-actions
added
category: CPP API
github-actions
added
category: PyTorch FE
Login to write a write a comment.
Login via GitHub
Reviewers
No reviews
Assignees
No one assigned
Labels
category: CPU
category: Python API
category: transformations
category: dependency_changes
category: CPP API
category: PyTorch FE
Milestone
No milestone
Login to write a write a comment.
Login via GitHub