[ROCm] use cupy for GPU-accelerated computing (#16611)
kernel explorer has lots of tests and need numpy to verify the results
of GPU kernels, it will make CPU utilization very high. This PR use
`cupy ` to replace `numpy` to do compute on GPU to reduce CPU
utilization.
set `KERNEL_EXPLORER_TEST_USE_CUPY=1` to enable cupy.