openvino
29da988d - [GPU] Optimize MVN and reorders for nnUNet INT8 5D model (#34949)

Commit
23 days ago
[GPU] Optimize MVN and reorders for nnUNet INT8 5D model (#34949) ### Summary Optimizes the OpenVINO GPU plugin for nnUNet INT8 5D inference by reducing redundant reorders, extending shape-agnostic coverage of blocked reorder kernels, and preventing an oneDNN deconvolution fallback to the reference kernel. On Intel Arc B390 (DUT4580PTLH), end-to-end inference latency drops from *18509 ms → 3781 ms* (*4.89×*). ### Changes (11 commits, TEST → IMPL paired) | Opt | TEST commit | IMPL commit | Scope | |-----|-------------|-------------|-------| | 1 | `44b7c06d89` | `5d720c278e` | Prevent oneDNN deconv from selecting `ocl:ref` kernel | | 2 | `6fc39924b7` | `2d2d2566bd` | MVN fsv16↔fsv32 cross-layout fusing; dynamic-shape MVN b_fs_yx_fsv16; 5D int8 concat preferred format | | 3 | `0452e8fe72` | `0a8ab0a467` | Dynamic-shape support for `reorder_data_bfyx_to_blocked_format` | | 4 | `87caeb366f` | `06fd68db38` + `e42585d73a` | New `reorder_data_fsv` kernel for blocked↔blocked fsv conversion + vload/vstore vectorization | | 5 | `f1c3cc437d` | `2bbcf03107` | Rename `_imad` kernel to `mvn_gpu_b_fs_yx_fsv16` (no longer int-only); extend dynamic reorder registry with the blocked formats the new kernels serve | ### Graph-level impact (main program final stage) | Stage | total nodes | reorder | mvn | |-------|---:|---:|---:| | master (baseline) | 311 | 56 | 22 | | after Opt1 | 348 | 56 | 22 | | after Opt2+ | **305** | **13** | 22 | Opt2 removes 43 reorders by allowing MVN to accept cross-layout fsv16/fsv32 input/output (consumer-direction rule is symmetric to the existing producer-direction rule in `can_fuse_reorder_to_prev`). ### E2E latency on Intel Arc B390 GPU (96 CUs, 2500 MHz), nnUNet INT8 5D | Build | Avg [ms] | Device total [s] | vs master | |-------|---:|---:|---:| | master | 18509 | 50.54 | 1.00× | | + Opt1 | 16639 | 44.65 | 1.11× | | + Opt2 | 6445 | 13.67 | 2.87× | | + Opt3 | 6763 | 13.71 | 2.74× (noise) | | + Opt4 | **3781** | **5.79** | **4.89×** | | + Opt5 | 4137 | 6.84 | 4.47× (noise) | ### Tickets - 182677 ### AI Assistance - AI assistance used: yes - AI: root-cause analysis, patch generation, kernel vectorization - User: design decisions, build, validation --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Author
Parents
Loading