openvino
ed38898b - [GPU] Fix TopK radix kernel priority for small sort sizes and k=1 (#34789)

Commit
51 days ago
[GPU] Fix TopK radix kernel priority for small sort sizes and k=1 (#34789) backport: https://github.com/openvinotoolkit/openvino/pull/34778 ### Description of the issue(symptom, root-cause, how it was resolved) #### Symptom After https://github.com/openvinotoolkit/openvino/pull/34539, deeplabv3 and efficientdet-d0 has huge Perf degradation, related to topK. #### Root cause arg_max_min_topk_radix was unconditionally given the highest kernel priority. Radix kernel: two-level histogram radix select+SLM bitonic sort: effective for large sort sizes and large k. Small sort sizes or small k fixed overhead of multiple full-data passes (caching, coarse/fine histogram, gather) and WG-level barriers dominates slower than arg_max_min_axis. This caused regressions when topK == 1 (argma: multi-pass radix is overkill vs. simple reduction) or sort_size < 256 (below WG_SIZE, most threads idle while paying full barrier and global memory overhead per pass). #### Resolution This patch lowers radix priority below arg_max_min_axis for these two cases. #### Reproduction step and snapshot (if applicable. Do not attach for customer model) $ ./benchmark_app -m deeplabv3.xml -d GPU.1 -t 5 $ ./benchmark_app -m efficientdet-d0.xml -d GPU.1 -t 5 #### Checklist - [x] Is it a proper fix? - [x] Did you include test case for this fix, if necessary? - [x] Did you review existing test that can be extended to cover this scenario? Which test did you review? ### Tickets: - *CVS-183209* Signed-off-by: hyunback <hyunback.kim@intel.com>
Author
Parents
Loading