onnxruntime
8bb3b07c - Implement experimental intermediate cross CPU EP allocation (#24371)

Commit
329 days ago
Implement experimental intermediate cross CPU EP allocation (#24371) ### Description <!-- Describe your changes. --> Onnxruntime manages a number of CPU based accelerators. I.e. those that can operate on CPU based inputs. However, several of them like `Qnn`, `Openvino` and `Vitis` may require CPU based inputs to be either aligned to 4K so they can be memory mapped or prefer to override the device with their own CPU accessible allocator. To mitigate that, we introduce a new CPU based allocator that produces 4K aligned memory. We also adjust allocation planner to override plain CPU device. When we detect a compiled CPU based EP, we adjust the device according by requesting the EP to return `OrtMemType::OrtMemTypeCPUInput`. This gives the EP an opportunity to return either GPU/NPU device or CPU device depending on the mode it is operating. We select the device with larger alignment betrween CPU default devices. We also adjust memory patterns to make sure 4K alignment is respected in the contagious buffers when appropriate. ### Motivation and Context CPU Based providers, notably accept CPU based inputs, but they have a requirement of 4K allocations, otherwise the input incurs an extra copy. This is especially noticeable with intermediate values that are produced by upstream CPU based nodes. Qnn has its own allocator when it is enabled, we make sure it is correctly advertised to the allocation planner. This PR excludes Qnn allocator usage for intermediate values due to the overhead contributed by memhandle management. Cc: @quic-ashigarg --------- Co-authored-by: edgchen1 <18449977+edgchen1@users.noreply.github.com>
Author
Parents
Loading