onnxruntime
ab5ff6a9 - [CUDA] fp16 intB gemm scale only kernel (#24955)

Commit
209 days ago
[CUDA] fp16 intB gemm scale only kernel (#24955) ### Description * Enable fp16 intB gemm kernels when zero points is not provided. * Minor changes of `fpA_intB_gemv/dispatcher.h` to fix build error for sm < 5.3. * Minor changes of `fpA_intB_gemm_preprocessors_impl.h` to fix unreachable code warnings in debug build. Note that we have existed test cases like `MatMulNBits.Fp16_Int4_NoZeroPoint` could cover the unit test. ### Motivation and Context The zero point input is optional for MatMulNBits. In https://github.com/microsoft/onnxruntime/pull/24854, we only enable fp16 intB gemm when zero points is provided.
Author
Parents
Loading