openvino
d4dd5dc1 - [Skills][Transformations] Introduce debug-matcher-pass skill (#35127)

Commit
4 days ago
[Skills][Transformations] Introduce debug-matcher-pass skill (#35127) ### Details: Introduces a new “debug-matcher-pass” skill that documents an end-to-end workflow for diagnosing why an OpenVINO MatcherPass transformation doesn’t fire, including a filled example diagnosis report to use as a quality reference. ### Usage Examples #### Example 1: Prompt: > I run the following command -- `./benchmark_app -d CPU -m <path>/openvino_model.xml -niter 0` and expect MoE3GeMMFusion matcher pass to be applied, but it looks like the matcher pass fail. Help me to investigate why this transformation is not applied. <details> <summary>The skill's output (click to expand):</summary> ## MatcherPass Diagnosis: MoE3GeMMFusion **Root cause:** All three `MatMul` nodes (gate, up, down projections) in the model's MoE subgraph have `transpose_b=false`, but the `MoE3GeMMFusion` pattern hard-requires `transpose_b=true` via an `attrs_match` predicate. No explicit `Transpose` node wraps the weight input either, so neither of the pattern's two matching paths can succeed. **Log evidence:** ``` { MATCHING PATTERN NODE: WrapType<MatMul>(WrapType, any_input) ├─ AGAINST GRAPH NODE: MatMul(Multiply, Reshape) ├─ PREDICATE `attrs_match({ transpose_b: YES, transpose_a: NO }) && consumers_count(1)` FAILED } NODES' TYPE MATCHED, but PREDICATE FAILED ``` This phrase appears **30 times** in the matcher log — once per MoE layer candidate — confirming all 30 match attempts fail at exactly the same point. **Failing node:** `ov::op::v0::MatMul` (down-projection, and symmetrically gate/up-projection MatMuls). Op type is correct; only the `transpose_b` attribute fails the predicate. **Resolution:** - File: `src/plugins/intel_cpu/src/transformations/cpu_opset/common/pass/moe_matmuls_fusion.cpp`, lines ~241, 248, 253 - Remove the `{"transpose_b", true}` constraint from all three `wrap_type<MatMul>` predicate maps and instead read the actual flag inside the callback via `gate_mm_node->get_transpose_b()`, then propagate it to `BatchGatherMatmul`. - Alternative: add a second pattern branch (via `pattern::op::Or`) covering `transpose_b=false`. ## Reproducer Test File: `src/plugins/intel_cpu/tests/unit/transformations/moe_matmuls_fusion_test.cpp` Test name: `MoE3GeMMFusion_TransposeBFalse_TestF.MoE3GeMMFusion_TransposeBFalse_NotApplied` Status before fix: **PASS (green)** — transformation does not fire; model unchanged equals the auto-cloned `model_ref`. Matcher log from the test run shows the identical `attrs_match({ transpose_b: YES })` predicate failure as the original model, confirming the reproducer is faithful. </details> #### Example 2: Prompt: > I need you to investigate why some transformations in the GPU plugin pipeline don't work. I run the following command: > `cd /home/guest/golubevv/openvino/bin/intel64/Release && ./benchmark_app -d GPU -m subgraph.xml -niter 0` > and see the issue which is most likely caused by the transformation pipeline problems. I know that the following transformations should fire: > > 1. FuseVectorizedMOE3GEMM > 2. ConvertMOEToMOECompressed > 3. FuseMOE3GemmCompressed > 4. KeepMOE3GemmConstPrecision > > But at least one of them didn't fire. Investigate the root-cause <details> <summary>The skill's output (click to expand):</summary> # MatcherPass Diagnosis: `FuseMOE3GemmCompressed` **Date:** 2026-04-03 **Model:** Trinity-Mini (`subgraph.xml`) **Run command:** `./benchmark_app -d GPU -m subgraph.xml -compile_only` --- ## Summary of all four passes | Pass | Result | |---|---| | `FuseVectorizedMOE3GEMM` | ✅ Fired (`CALLBACK SUCCEDED`) | | `ConvertMOEToMOECompressed` | ✅ Fired (`CALLBACK SUCCEDED`) — produced `MOECompressed` nodes | | **`FuseMOE3GemmCompressed`** | ❌ **Never matched** — root cause | | `KeepMOE3GemmConstPrecision` | ❌ Never matched — downstream effect: no `MOE3GemmFusedCompressed` exists because step 3 didn't fire; triggers crash `Input moecompressed: ... hasn't been found in primitive_ids map` | --- ## Root cause The sigmoid routing normalization branch in the model has an extra `Multiply(Divide, Constant)` node between the normalization `Divide` and the `Slice`. The `FuseMOE3GemmCompressed` pattern's `sig_slice` (defined at line 91 of [fuse_moe_3gemm_compressed.cpp](src/plugins/intel_gpu/src/plugin/transformations/fuse_moe_3gemm_compressed.cpp)) expects: ``` Slice( Divide(...), ... ) ``` But the graph has: ``` Slice( Multiply( Divide(...), Constant ), ... ) ``` The extra `Multiply` is a routing-weight scaling constant multiply inserted after normalization. It is present in the Trinity-Mini model but was not present in the models used when the pattern was originally written. --- ## Log evidence From both `/tmp/matcher.log` (original run) and the unit test reproducer: ``` { ARGUMENT 0: WrapType<Divide> MATCHING PATTERN NODE: WrapType<Divide>(WrapType, WrapType) AGAINST GRAPH NODE: Multiply(Divide, Constant) } NODES' TYPE DIDN'T MATCH. EXPECTED: WrapType<Divide>. OBSERVED: Multiply ``` **Location in the match tree:** `MOECompressed` → ARGUMENT 1 (`Unsqueeze`) → ARGUMENT 0 (`Reshape`) → ARGUMENT 0 (`Transpose`) → ARGUMENT 0 (`ScatterElementsUpdate`) → **ARGUMENT 2** (`patternOr(sm_norm_slice | sig_slice)`) → BRANCH 1 (`sig_slice`) → **ARGUMENT 0** (expected `WrapType<Divide>`, observed `Multiply`). **Failing graph node:** `Multiply(Divide, Constant)` — the routing-weight scaling step inserted between `sig_norm` (Divide) and `sig_slice` (Slice). --- ## Crash consequence Because `FuseMOE3GemmCompressed` never fires, `KeepMOE3GemmConstPrecision` finds no `MOE3GemmFusedCompressed` nodes and also does nothing. The GPU program builder then fails: ``` Exception from src/plugins/intel_gpu/src/plugin/program_builder.cpp:265: Input moecompressed:__module.model.layers.2.mlp/aten::sum/ReduceSum/MOECompressed hasn't been found in primitive_ids map ``` --- ## Resolution In [src/plugins/intel_gpu/src/plugin/transformations/fuse_moe_3gemm_compressed.cpp](src/plugins/intel_gpu/src/plugin/transformations/fuse_moe_3gemm_compressed.cpp) around lines 90–91, add an `optional<Multiply>` wrapper between `sig_norm` and `sig_slice`: ```cpp // Before: auto sig_norm = wrap_type<ov::op::v1::Divide>({sig_gather_el, sig_add_eps}, consumers_count(1)); auto sig_slice = wrap_type<ov::op::v8::Slice>({sig_norm, ANY, ANY, ANY, ANY}, consumers_count(1)); // After: auto sig_norm = wrap_type<ov::op::v1::Divide>({sig_gather_el, sig_add_eps}, consumers_count(1)); auto sig_norm_scaled = optional<ov::op::v1::Multiply>({sig_norm, ANY}); // absorbs optional scaling auto sig_slice = wrap_type<ov::op::v8::Slice>({sig_norm_scaled, ANY, ANY, ANY, ANY}, consumers_count(1)); ``` `optional<Multiply>` will match `Multiply(Divide, Constant)` when present, or fall through to `Divide` directly for models without the extra scaling — no change to the existing passing test cases. > **Note:** Once the pattern fires, also verify that `MOE3GemmFusedCompressed`'s kernel/op implementation correctly applies the routing-weight scaling factor. The `Multiply` constant is currently not passed as an explicit argument to the fused op — the kernel may need a new input or an internal constant to reproduce correct numerical output. --- ## Reproducer Test **File:** [src/plugins/intel_gpu/tests/unit/transformations/fuse_moe_3gemm_compressed_test.cpp](src/plugins/intel_gpu/tests/unit/transformations/fuse_moe_3gemm_compressed_test.cpp) **Test name:** `TransformationTestsF.FuseMOE3GemmCompressed_SigmoidBias_ScaledNorm` **Build target:** `ov_gpu_unit_tests` **Run command:** ```bash cd /home/guest/golubevv/openvino/bin/intel64/Release OV_MATCHER_LOGGING=true OV_MATCHERS_TO_LOG=FuseMOE3GemmCompressed \ ./ov_gpu_unit_tests \ --gtest_filter="*FuseMOE3GemmCompressed_SigmoidBias_ScaledNorm*" ``` **Status before fix:** PASS ✅ — the transformation does not fire so the model is unchanged and matches the auto-cloned `model_ref`. This confirms the bug is reproduced. The test log shows the identical failure phrase: ``` NODES' TYPE DIDN'T MATCH. EXPECTED: WrapType<Divide>. OBSERVED: Multiply AGAINST GRAPH NODE: Slice(Multiply, Constant, ShapeOf, Constant, Constant) ``` **After fix:** the test will FAIL because `model` is now transformed and no longer matches the auto-cloned ref. At that point, add an explicit `model_ref` block with the expected `MOE3GemmFusedCompressed` result graph to turn it into a proper regression guard. </details> ### Tickets: - *N\A* ### AI Assistance: - *yes* - *AI was used to improve the skill based on real usage examples* --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Author
Parents
Loading