onnxruntime
64674c50 - Added a tool to quantize Gather to GatherBlockQuantized (#21697)

Commit

275 days ago

Added a tool to quantize Gather to GatherBlockQuantized (#21697) ### Description Added code in MatMul4BitsQuantizer to quantize Gather to GatherBlockQuantized. Only Gather with constant data is quantized. Since quantized data is in int4, the quantized model will force upgrade to onnx opset 21. The implementation purely relies on numpy. If optimization is needed, C++ kernels can be added later. Only support default RTN algorithm since GatherBlockQuantized require zero points to have the same type as quantized data. ### Motivation and Context Support quantizing gather to int4 in Web scenario.

References

#21697 - Added a tool to quantize Gather to GatherBlockQuantized

Author

fajin-corp

Parents

7ae0b4ce

Files3

onnxruntime
- python/tools/quantization
  - matmul_4bits_quantizer.py
  - quantize.py
- test/python/quantization
  - test_op_matmul_4bits.py

onnxruntime 64674c50 - Added a tool to quantize Gather to GatherBlockQuantized (#21697)

onnxruntime
64674c50 - Added a tool to quantize Gather to GatherBlockQuantized (#21697)