openvino
f52f6f5d - [GPU] Disable in-place crop when consumer is MVN requiring alignment (#35700)

Commit
24 days ago
[GPU] Disable in-place crop when consumer is MVN requiring alignment (#35700) ### Description of the issue(symptom, root-cause, how it was resolved) - Symptom: GPU inference produced incorrect results for an IR pattern `Parameter → Slice → MVN(axis=-1)`. Compared to CPU, the max absolute diff was ~2.97 (FP32). Inspecting the MVN output buffer showed that only the first row was correct; rows 1..N contained garbage (data read from outside the intended sub-view). - Root cause: Two GPU graph optimizations have conflicting assumptions about the MVN input buffer layout: - `crop_in_place_optimization` rewrites the crop (Slice) so that its output is a **strided sub-view** of the parent buffer (e.g. `BATCH_PITCH=1280` for `[1,2,2,32]` cropped from `[3,4,5,64]`). - The MVN OCL impl's `static_canonicalize_shapes` flattens the input shape (e.g. `[1,2,2,32] → [4,1,1,32]`) and the kernel reads with pitches computed for that contiguous flattened shape (`BATCH_PITCH=32`). The two pitch sets disagree, so only row 0 lands on valid memory. - Fix: In `crop_in_place_optimization::match()`, skip the in-place optimization when any consumer is an `mvn` whose `requires_alignment(crop_layout.get_partial_shape())` is `true`. The crop then materializes a contiguous output buffer that matches MVN's pitch assumptions. After the fix, max diff drops to ~1.4e-3 (FP16 noise level). #### The code and line that caused this issue (if it is not changed directly) - `src/plugins/intel_gpu/src/graph/impls/ocl/mvn.cpp` — `mvn_impl::static_canonicalize_shapes()` flattens dims without accounting for the actual buffer strides of the input. - `src/plugins/intel_gpu/src/graph/graph_optimizer/prepare_buffer_fusing.cpp` — `crop_in_place_optimization::match()` did not exclude MVN consumers that require alignment. #### Reproduction step and snapshot (if applicable. Do not attach for customer model) - Unit test: `./ov_gpu_unit_tests --gtest_filter=*crop_then_mvn_last_axis_contiguous_input*` - Without the fix: `FAIL` (mismatch starting at index 0) - With the fix: `PASS` #### Problematic graph ``` Parameter [3,4,5,64] │ ▼ Slice ──(in-place, strided sub-view: BATCH_PITCH=1280)──▶ [1,2,2,32] │ ▼ MVN(axes=[-1]) ──(canonicalized to [4,1,1,32], expects BATCH_PITCH=32) │ ▼ Result ◀── only row 0 correct; rows 1..3 read OOB memory ``` #### Checklist - [x] Is it a proper fix? (not a workaround) - [x] Did you include test case for this fix, if necessary? - `mvn_gpu_test.crop_then_mvn_last_axis_contiguous_input` (fails on master, passes with fix). - [x] Did you review existing test that can be extended to cover this scenario? Which test did you review? - Reviewed `mvn_random_test*` and crop in-place tests in `crop_gpu_test.cpp`; none combine an in-place crop with an alignment-requiring MVN consumer, so a new dedicated test was added. ### Tickets: - [CVS-185315](https://jira.devtools.intel.com/browse/CVS-185315)
Author
Parents
Loading