[CUDA] fp16 intB gemm scale only kernel (#24955)
### Description
* Enable fp16 intB gemm kernels when zero points is not provided.
* Minor changes of `fpA_intB_gemv/dispatcher.h` to fix build error for
sm < 5.3.
* Minor changes of `fpA_intB_gemm_preprocessors_impl.h` to fix
unreachable code warnings in debug build.
Note that we have existed test cases like
`MatMulNBits.Fp16_Int4_NoZeroPoint` could cover the unit test.
### Motivation and Context
The zero point input is optional for MatMulNBits. In
https://github.com/microsoft/onnxruntime/pull/24854, we only enable fp16
intB gemm when zero points is provided.