onnxruntime
84d48b6a - [QNN EP] Add provider option to offload graph I/O quantization/dequantization to the CPU EP (#22436)

Commit
1 year ago
[QNN EP] Add provider option to offload graph I/O quantization/dequantization to the CPU EP (#22436) ### Description Adds QNN provider option `offload_graph_io_quantization` to offload graph input quantization and graph output dequantization to the CPU EP. Option is disabled by default to maintain current behavior. ### Motivation and Context Offloading the handling of I/O quantization to the CPU EP significantly improves inference latency for many models.
Parents
Loading