onnxruntime
ab5ff6a9 - [CUDA] fp16 intB gemm scale only kernel (#24955)

Commit

235 days ago

[CUDA] fp16 intB gemm scale only kernel (#24955) ### Description * Enable fp16 intB gemm kernels when zero points is not provided. * Minor changes of `fpA_intB_gemv/dispatcher.h` to fix build error for sm < 5.3. * Minor changes of `fpA_intB_gemm_preprocessors_impl.h` to fix unreachable code warnings in debug build. Note that we have existed test cases like `MatMulNBits.Fp16_Int4_NoZeroPoint` could cover the unit test. ### Motivation and Context The zero point input is optional for MatMulNBits. In https://github.com/microsoft/onnxruntime/pull/24854, we only enable fp16 intB gemm when zero points is provided.

References

#24955 - [CUDA] fp16 intB gemm scale only kernel

Author

tianleiwu

Parents

5fdd4e4f

onnxruntime ab5ff6a9 - [CUDA] fp16 intB gemm scale only kernel (#24955)

onnxruntime
ab5ff6a9 - [CUDA] fp16 intB gemm scale only kernel (#24955)