Add MatMul FP4 and NF4 Support (#18066)

Commit

2 years ago

Add MatMul FP4 and NF4 Support (#18066) ### Description Add a contrib op MatMulBnb4 (FP4 and NF4) and related toolchain to support quantization on weight. This PR adds: - schema for contrib op MatMulBnb4 which can support FP4 (4-bit floating point) and NF4 (4-bit NormalFloat) quantization on weight. - a naive implementation for MatMulBnb4 on CPU and GPU, i.e., implemented like MatMul(A, Dequantize(B)). - a special implementation for GemV for MatMulBnb4 and related benchmark tool. - tool to quantize model to FP4 or NF4.

References

#18066 - Add MatMul FP4 and NF4 Support

Author

jambayk

Parents

d88d52ee

onnxruntime d30d4d37 - Add MatMul FP4 and NF4 Support (#18066)

onnxruntime
d30d4d37 - Add MatMul FP4 and NF4 Support (#18066)