onnxruntime
788ca51b - [QNN-EP]: Fix inference failures while running with htp_shared_memory (#23892)

Commit
1 year ago
[QNN-EP]: Fix inference failures while running with htp_shared_memory (#23892) ### Description When using the enable_htp_shared_memory feature, we see that the address of the buffer passed to rpcmem_free is incorrect. So the rpc buffers are not freed leading to memory exhaustion. ### Motivation and Context When using the enable_htp_shared_memory_allocator feature for QNN in GenAI extensions, it leads to inference failures during the second prompt. As GenAI memory asks are higher, it surfaces sooner in gen AI use cases. Co-authored-by: Ashish Garg <ashigarg@qti.qualcomm.com>
Author
Parents
Loading