onnxruntime
089c52e4 - Add python bindings to the global thread pool functionality (#24238)

Commit
283 days ago
Add python bindings to the global thread pool functionality (#24238) ### Description Allows users to configure and enable the global thread pool via Python, and have inference sessions use it instead of session-local thread pools. ### Motivation and Context Forked off of #23495 to take over implementation, see issue #23523. Our particular use case involves a single service instance serving thousands of individual models, each relatively small (e.g. small decision trees). Creating individual services for each model is too much overhead, and attempting to start several thousand thread-pools is a non-starter. We could possibly have each session be single-threaded, but we would like to be able to separate the request handler thread count from the compute thread count (e.g. 2 handler threads but 4 intra-op ones). <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: alex-halpin <alex.halpin@prizepicks.com>
Author
Parents
Loading