Backport gather fix 2026.1 (#34907)
backport: https://github.com/openvinotoolkit/openvino/pull/34897
### Description of the issue(symptom, root-cause, how it was resolved)
#### Symptom
Low similarity with granite-4.0-h-micro
#### Root cause
Fusing post-ops into rank-changing gather can generate incorrect index
mapping, causing output mismatches.
When gather has rank decrease (e.g., 5D->4D), static_canonicalize_shapes
pads the output back to 5D by inserting dim=1 at the gather axis (e.g.,
{-1,64,64,128} -> {-1,1,64,64,128}).
However, the fused eltwise peer tensor remains 4D. In the jitter, GetIdx
selects index slots based on the peer tensor's rank (4D -> b,f,y,x), so
the kernel's z loop variable - which iterates over actual data - is
never used for peer indexing. This causes the fused eltwise to read
incorrect data, as the f slot always maps to 0 (the padded dimension)
instead of the actual data dimension.
#### Resolution
Disable gather fusion decrease rank from input to output. while keeping
safe exceptions scalar eltwise cases.
Although eltwise is the root cause this model, quantize as well due to
potential issues.
Gather eltwise post-op fusion in rank decrease
| Post-op | Fusion|
|-------------------------|:-----:|
| Eltwise (scalar) | O |
| Eltwise (per-channel) | X |
| Eltwise (full-tensor) | X |
#### Problematic graph
Gather_4: in[1,2,64,64,128] -> out[1,64,64,128] + Multiply_27+Add_9
<img width="1597" height="1081" alt="image"
src="https://github.com/user-attachments/assets/fa3afa2f-39af-4168-b281-0f988e37d3fe"
/>
#### Reproduction step and snapshot (if applicable. Do not attach for
customer model)
$ python ./tools/who_what_benchmark/whowhatbench/wwb.py --target-model
/mnt/models/ov-share-13.iotg.sclab.intel.com/cv_bench_cache/WW11_llm-optimum_2026.1.0-21296/granite-4.0-h-micro/pytorch/ov/FP16
--gt-data
/mnt/models/ov-share-04.iotg.sclab.intel.com/cv_bench_cache/AC_llm/wwb_ref_gt_data_cache/2026.1.0-21296-4589d335731_nat_ref/CPU_ICX/default_data_wwb/cache_nat_refs_cli/granite-4.0-h-micro__NAT/reference.csv
--model-type text --genai --device GPU.1 --output ./wwb --verbose
#### Checklist
- [ ] Is it a proper fix? The fundamental FIX is to make peer rank the
same as gather and process it.
- [x] Did you include test case for this fix, if necessary? Yes
- [x] Did you review existing test that can be extended to cover this
scenario? Which test did you review? gather_fusion_test
### Tickets:
- *CVS-183103*
---------
Signed-off-by: hyunback <hyunback.kim@intel.com>