openvino
ca968e4d - [NPUW] Fix multiple MatMul matching issue in NPUW LM head cutting (#32475)

Commit

149 days ago

[NPUW] Fix multiple MatMul matching issue in NPUW LM head cutting (#32475) ### Details: **Background:** Eagle 3 pipeline will add new output in target model to get the intermedium feature embeddings. The `cut_lm_head` function separates the vocabulary matrix (LM head) from LLM models for efficient inference. It needs to identify the correct `MatMul` operation among multiple candidates in the model graph. **Problem:** When multiple `MatMul` operations match the pattern (common in LLMs), the callback executes multiple times, with each execution overwriting the previous result. Only the last matched `MatMul` is used, often missing the actual vocabulary matrix. **Solution:** Replaced `MatcherPass` with direct traversal and intelligent selection: 1. Collect all candidates instead of using last match 2. Select MatMul with largest matrix size (vocabulary size heuristic) 3. Optimize traversal - iterate Result nodes directly instead of all nodes ### Tickets: - [*CVS-175198*](https://jira.devtools.intel.com/browse/CVS-175198)

References

#32475 - [NPUW] Fix multiple MatMul matching issue in NPUW LM head cutting

Author

GuoliangShiIntel

Parents

4832243b

openvino ca968e4d - [NPUW] Fix multiple MatMul matching issue in NPUW LM head cutting (#32475)

openvino
ca968e4d - [NPUW] Fix multiple MatMul matching issue in NPUW LM head cutting (#32475)