[GPU] Fix onednn concat validation for non-block-aligned feature in blocked formats (#34506)
### Details:
- Fix NaN output in onednn concat layer when input feature dimension is
not aligned to the block size in blocked memory formats.
### Description of the issue(symptom, root-cause, how it was resolved)
- **Symptom**: TF_Separate_Bass model produces NaN values on GPU with
FP16 precision at concat:Transpose_125956059 layer inside Loop
sub-graph. Two clean inputs [2,24,16,256] with f16 b_fs_yx_fsv16 format
are concatenated along feature axis to [2,48,16,256], but output contain
more than 8 hundred NaN values.
- **Root Cause**:
- The concat layer's two inputs have feature=24 in b_fs_yx_fsv16 (block
size=16) format, where 24 % 16 != 0 (not block-aligned)
- The validate_impl() in concatenation_onednn.hpp checks output feature
alignment (is_feature_aligned(out_layout)) but does not check input
feature alignment
- Output feature 48 is aligned (48 % 16 == 0), so the check passes, and
onednn concat is selected
- The onednn concat kernel has a bug handling non-block-aligned input
features in blocked formats, causing data corruption at block boundaries
- Static models are not affected: build-time allocation always
zero-fills padding, so padding is safe.
- **Resolution**:
- In concatenation_onednn.hpp validate_impl(), add ` if
(node.is_dynamic() && !is_feature_aligned(in_layout))` check for all
input layouts inside the dependency loop, consistent with the existing
output layout check
- This ensures onednn concat is rejected only when the combination of
dynamic memory reuse (no zero-fill) and non-block-aligned input features
would produce incorrect results. Static models retain the onednn path
and are unaffected performance-wise.
- When onednn is rejected for non-block-aligned inputs, the framework
falls back to OCL concat which correctly handles this case
- Added unit test concat_gpu_onednn.dynamic_non_block_aligned_feature to
verify the fix
#### The code and line that caused this issue (if it is not changed
directly)
https://github.com/openvinotoolkit/openvino/blob/81bb2f9d63fefa933a5aec40a6560364bb392a2b/src/plugins/intel_gpu/src/graph/impls/onednn/concatenation_onednn.hpp#L87-L100
#### Reproduction step and snapshot (if applicable. Do not attach for
customer model)
python -m pytest test_ovc_mo.py \
-n 2 \
--tb=native \
--env_conf=.automation/env_config.yml \
--test_conf=.automation/test_configs/desktop_test_config_gpu_llm.yml \
-m "not launch_only_if_manually_specified" \
--pregen_irs=models/irs_mapping.csv \
--tf_models_version=1.15.2 \
--modules pipelines/production/tf/light \
-k "TF_Ssd_Inception_v2_coco_api_2_True" \
--dynamism_type=None \
--log-cli-level INFO
#### Problematic graph
- Original IR
<img width="2761" height="1138" alt="image"
src="https://github.com/user-attachments/assets/3c005f57-f204-483a-96c5-bed751ca56ff"
/>
- Current IR
<img width="3157" height="1123" alt="image"
src="https://github.com/user-attachments/assets/3650ab8d-987c-4598-91c0-dcacf7b046bb"
/>
#### Checklist
- [v] Is it a proper fix? (not a workaround)
- [v] Did you include test case for this fix, if necessary?
- [v] Did you review existing test that can be extended to cover this
scenario? Which test did you review?
### Tickets:
- *CVS-181149*
---------
Signed-off-by: zhanmyz <yazhan.ma@intel.com>