[ROCm EP/ MIGraphx EP] matmul_nbits: Use GPU_WARP_SIZE_HOST for host side code (#22045)
### Description
For ROCm device, the host side code needs to call GPU_WARP_SIZE_HOST to
query warpSize
of the underlying GPU device.
### Motivation and Context
Fixes MatMulNBits tests on gfx1100/01 which has warpSize of 32.
Signed-off-by: Jagadish Krishnamoorthy <jagadish.krishnamoorthy@amd.com>