onnxruntime
78118392 - [QNN EP] Always fuse (DQ->Q) to a QNN Convert operator (#22205)

Commit
1 year ago
[QNN EP] Always fuse (DQ->Q) to a QNN Convert operator (#22205) ### Description Previously, we only fused (DQ -> Q) into a QNN Convert if the quantization types differed (e.g., converting uint8 to uint16). This PR always fuses DQ -> Q regardless of the quantization type because a single QNN Convert op is faster than two separate ops. Example fusions: - [CURRENTLY SUPPORTED] Convert uint8 to uint16: - `uint8 -> DQ -> Q -> uint16` becomes `uint8 -> Convert -> uint16` - [CURRENTLY SUPPORTED] Convert uint16 to uint8: - `uint16 -> DQ -> Q -> uint8` becomes `uint16 -> Convert -> uint8` - [NEW] Convert uint8 (zp0, scale0) to uint8 (zp1, scale1): - `uint8(zp0/scale0) -> DQ -> Q -> uint8(zp1/scale1)` becomes `uint8(zp0/scale0) -> Convert -> uint8(zp1/scale1)` - [NEW] Convert uint16 (zp0, scale0) to uint16 (zp1, scale1): - `uint16(zp0/scale0) -> DQ -> Q -> uint16(zp1/scale1)` becomes `uint16(zp0/scale0) -> Convert -> uint16(zp1/scale1)` ### Motivation and Context The Transpose optimizer will normally remove empty DQ->Q sequences if the quantization params are equal. However, for cases in which the quantization params are not equal, QNN EP should convert DQ->Q to a single QNN Convert op for performance. This affects a customer model.
Parents
Loading