[GPU] Fix TopK radix kernel priority for small sort sizes and k=1 (#34789)
backport: https://github.com/openvinotoolkit/openvino/pull/34778
### Description of the issue(symptom, root-cause, how it was resolved)
#### Symptom
After https://github.com/openvinotoolkit/openvino/pull/34539,
deeplabv3 and efficientdet-d0 has huge Perf degradation, related to
topK.
#### Root cause
arg_max_min_topk_radix was unconditionally given the highest kernel
priority.
Radix kernel: two-level histogram radix select+SLM bitonic sort:
effective for large sort sizes and large k.
Small sort sizes or small k fixed overhead of multiple full-data passes
(caching, coarse/fine histogram, gather) and WG-level barriers dominates
slower than arg_max_min_axis.
This caused regressions when topK == 1
(argma: multi-pass radix is overkill vs. simple reduction) or sort_size
< 256
(below WG_SIZE, most threads idle while paying full barrier and global
memory overhead per pass).
#### Resolution
This patch lowers radix priority below arg_max_min_axis for these two
cases.
#### Reproduction step and snapshot (if applicable. Do not attach for
customer model)
$ ./benchmark_app -m deeplabv3.xml -d GPU.1 -t 5
$ ./benchmark_app -m efficientdet-d0.xml -d GPU.1 -t 5
#### Checklist
- [x] Is it a proper fix?
- [x] Did you include test case for this fix, if necessary?
- [x] Did you review existing test that can be extended to cover this
scenario? Which test did you review?
### Tickets:
- *CVS-183209*
Signed-off-by: hyunback <hyunback.kim@intel.com>