openvino
aa0cbb42 - [GPU] Optimize gen9 common f32 conv kernel for batch 32 large 1d input (#32364)

Commit
207 days ago
[GPU] Optimize gen9 common f32 conv kernel for batch 32 large 1d input (#32364) ### Description of the issue(symptom, root-cause, how it was resolved) - Customer model is quite slow in f32 inference mode which has batch=32 1d large convolution on DG2. - Optimize gen9_common_conv_kernel_f32 for this case #### The code and line that caused this issue (if it is not changed directly) - src/plugins/intel_gpu/src/kernel_selector/cl_kernels/gen9_common_conv_fwd_data_f32.cl #### Reproduction step and snapshot (if applicable. Do not attach for customer model) - $ benchmark_app -d GPU -m emb.xml -infer_precision f32 #### Problematic graph <img width="210" height="176" alt="image" src="https://github.com/user-attachments/assets/c4c3904d-f6f7-4c71-96bf-faffa1c0af4f" /> #### Checklist - [x] Is it a proper fix? (not a workaround) - [x] Did you include test case for this fix, if necessary? - [x] Did you review existing test that can be extended to cover this scenario? Which test did you review? ### Tickets: - 173214
Author
Parents
Loading