dequantization to the CPU EP (#22436)

Commit

1 year ago

[QNN EP] Add provider option to offload graph I/O quantization/dequantization to the CPU EP (#22436) ### Description Adds QNN provider option `offload_graph_io_quantization` to offload graph input quantization and graph output dequantization to the CPU EP. Option is disabled by default to maintain current behavior. ### Motivation and Context Offloading the handling of I/O quantization to the CPU EP significantly improves inference latency for many models.

References

#22436 - [QNN EP] Add provider option to offload graph I/O quantization/dequantization to the CPU EP

Author

adrianlizarraga

Parents

b7050c83

onnxruntime 84d48b6a - [QNN EP] Add provider option to offload graph I/O quantization/dequantization to the CPU EP (#22436)

onnxruntime
84d48b6a - [QNN EP] Add provider option to offload graph I/O quantization/dequantization to the CPU EP (#22436)