onnxruntime
5612ce51 - [WebNN] Support MatMulNBits op (#24142)

Commit
262 days ago
[WebNN] Support MatMulNBits op (#24142) MatMulNBits op can be simply emulated by DequantizeLinear + Transpose + MatMul and currently only 4-bit quantization is supported. Thus the B and zero_points (if present) inputs must be known as initializers with data type 'uint8' and we need to register them as 'uint4' WebNN constant. Typically, all initializers are registered as WebNN constants in one step via `ModelBuilder::RegisterInitializers` before constructing the WebNN graph. However, due to WebNN doesn't support cast to 'uint4', we need to defer the registration of these two inputs until the `MatMulNBitsBuilder::AddToModelBuilderImpl` is invoked.
Author
Parents
Loading