onnxruntime
5dee95fa - [CUDA] Support CUDA EP blocked quantization in Q/DQ ops. (#21846)

Commit
1 year ago
[CUDA] Support CUDA EP blocked quantization in Q/DQ ops. (#21846) ### Description 1. Added CUDA EP support for blocked quantization in QuantizeLinear and DequantizeLinear ops. 2. Currently CUDA EP blocked quantization only supports int4/uint4 quantized types and float32/float16 unquantized types. 3. Added CUDA EP support in QDQ selector/action transformer. CUDA EP is only added to DQ + MatMul -> MatMulNBits rule. Other rules' EP support are not changed. ### Motivation and Context ONNX opset 21 introduced blocked quantization for Q/DQ opts. ORT originally only supports CPU EP blocked quantization.
Author
Parents
Loading