openvino
10027c8d - [GPU] NormalizeL2Decomposition fp32 inner nodes to avoid fp16 range overflow from reducesum (#31623)

Commit
270 days ago
[GPU] NormalizeL2Decomposition fp32 inner nodes to avoid fp16 range overflow from reducesum (#31623) ### Description of the issue(symptom, root-cause, how it was resolved) - fp16 range overflow happens in reducesum layer from NormalizeL2 decomposition subgraph. It causes accuracy failure in customer model. - Use decomposition with fp32 internal nodes instead of using ref kernel. It has slightly better performance in target model(42fps)\ - Onednn reduction primitive supports fp32 src/dst. Removed fp32 limitation in ReduceImplementationManager. #### The code and line that caused this issue (if it is not changed directly) - [src/plugins/intel_gpu/src/kernel_selector/cl_kernels/normalize_gpu_within_spatial_ref.cl](https://github.com/openvinotoolkit/openvino/blob/7e847ed3db46004f78ce4e2008e03cca534d2050/src/plugins/intel_gpu/src/plugin/transformations_pipeline.cpp#L853) #### Reproduction step and snapshot (if applicable. Do not attach for customer model) - $ ./benchmark_app -d GPU.1 -m ~/task/blackmagic/RealWeightsIR/MusicRetimer/bt.xml -i ~/task/blackmagic/InputNpys/MusicRetimer_bt_input_0.npy #### Problematic graph - Decomposition subgraph <img width="952" height="991" alt="image" src="https://github.com/user-attachments/assets/603e715a-302d-4a7e-92ac-0ed8a4bd8725" /> - Decomposition with fp32 nodes <img width="769" height="780" alt="image" src="https://github.com/user-attachments/assets/cdcca658-8da4-4ec2-8d90-f50b24504536" /> #### Checklist - [v] Is it a proper fix? (not a workaround) - [v] Did you include test case for this fix, if necessary? - [v] Did you review existing test that can be extended to cover this scenario? Which test did you review? ### Tickets: - 163878
Author
Parents
Loading