onnxruntime
b47e1e64 - [QNN EP] Make offloading graph input/output quantization (to CPU) the default (#23368)

Commit

331 days ago

[QNN EP] Make offloading graph input/output quantization (to CPU) the default (#23368) ### Description Makes the QNN provider option `offload_graph_io_quantization` enabled by default. It was previously disabled by default. ### Motivation and Context Enabling this option significantly decreases inference latency for many models.

References

#23368 - [QNN EP] Make offloading graph input/output quantization (to CPU) the default

Author

adrianlizarraga

Parents

75a9b40d

onnxruntime b47e1e64 - [QNN EP] Make offloading graph input/output quantization (to CPU) the default (#23368)

onnxruntime
b47e1e64 - [QNN EP] Make offloading graph input/output quantization (to CPU) the default (#23368)