onnxruntime
788ca51b - [QNN-EP]: Fix inference failures while running with htp_shared_memory (#23892)

Commit

1 year ago

[QNN-EP]: Fix inference failures while running with htp_shared_memory (#23892) ### Description When using the enable_htp_shared_memory feature, we see that the address of the buffer passed to rpcmem_free is incorrect. So the rpc buffers are not freed leading to memory exhaustion. ### Motivation and Context When using the enable_htp_shared_memory_allocator feature for QNN in GenAI extensions, it leads to inference failures during the second prompt. As GenAI memory asks are higher, it surfaces sooner in gen AI use cases. Co-authored-by: Ashish Garg <ashigarg@qti.qualcomm.com>

References

#23892 - [QNN-EP]: Fix inference failures while running with htp_shared_memory

Author

quic-ashigarg

Parents

9d0dc9f0

onnxruntime 788ca51b - [QNN-EP]: Fix inference failures while running with htp_shared_memory (#23892)

onnxruntime
788ca51b - [QNN-EP]: Fix inference failures while running with htp_shared_memory (#23892)