[GPU] Disable in-place crop when consumer is MVN requiring alignment (#35700)
### Description of the issue(symptom, root-cause, how it was resolved)
- Symptom: GPU inference produced incorrect results for an IR pattern
`Parameter → Slice → MVN(axis=-1)`. Compared to CPU, the max absolute
diff was ~2.97 (FP32). Inspecting the MVN output buffer showed that only
the first row was correct; rows 1..N contained garbage (data read from
outside the intended sub-view).
- Root cause: Two GPU graph optimizations have conflicting assumptions
about the MVN input buffer layout:
- `crop_in_place_optimization` rewrites the crop (Slice) so that its
output is a **strided sub-view** of the parent buffer (e.g.
`BATCH_PITCH=1280` for `[1,2,2,32]` cropped from `[3,4,5,64]`).
- The MVN OCL impl's `static_canonicalize_shapes` flattens the input
shape (e.g. `[1,2,2,32] → [4,1,1,32]`) and the kernel reads with pitches
computed for that contiguous flattened shape (`BATCH_PITCH=32`). The two
pitch sets disagree, so only row 0 lands on valid memory.
- Fix: In `crop_in_place_optimization::match()`, skip the in-place
optimization when any consumer is an `mvn` whose
`requires_alignment(crop_layout.get_partial_shape())` is `true`. The
crop then materializes a contiguous output buffer that matches MVN's
pitch assumptions. After the fix, max diff drops to ~1.4e-3 (FP16 noise
level).
#### The code and line that caused this issue (if it is not changed
directly)
- `src/plugins/intel_gpu/src/graph/impls/ocl/mvn.cpp` —
`mvn_impl::static_canonicalize_shapes()` flattens dims without
accounting for the actual buffer strides of the input.
-
`src/plugins/intel_gpu/src/graph/graph_optimizer/prepare_buffer_fusing.cpp`
— `crop_in_place_optimization::match()` did not exclude MVN consumers
that require alignment.
#### Reproduction step and snapshot (if applicable. Do not attach for
customer model)
- Unit test: `./ov_gpu_unit_tests
--gtest_filter=*crop_then_mvn_last_axis_contiguous_input*`
- Without the fix: `FAIL` (mismatch starting at index 0)
- With the fix: `PASS`
#### Problematic graph
```
Parameter [3,4,5,64]
│
▼
Slice ──(in-place, strided sub-view: BATCH_PITCH=1280)──▶ [1,2,2,32]
│
▼
MVN(axes=[-1]) ──(canonicalized to [4,1,1,32], expects BATCH_PITCH=32)
│
▼
Result ◀── only row 0 correct; rows 1..3 read OOB memory
```
#### Checklist
- [x] Is it a proper fix? (not a workaround)
- [x] Did you include test case for this fix, if necessary?
- `mvn_gpu_test.crop_then_mvn_last_axis_contiguous_input` (fails on
master, passes with fix).
- [x] Did you review existing test that can be extended to cover this
scenario? Which test did you review?
- Reviewed `mvn_random_test*` and crop in-place tests in
`crop_gpu_test.cpp`; none combine an in-place crop with an
alignment-requiring MVN consumer, so a new dedicated test was added.
### Tickets:
- [CVS-185315](https://jira.devtools.intel.com/browse/CVS-185315)