onnxruntime
d5fa2ac1 - Improve Windows ETW callback registration and fix issues (#24877)

Commit
221 days ago
Improve Windows ETW callback registration and fix issues (#24877) ### Description - `EtwRegistrationManager`. Make sure all fields initialized by a constructor - Register a callback object instead of a pointer to it. Store it in the map with a session unique key. - Register `ML_Ort_Provider_Etw_Callback` once for all the sessions. The first session registers, the last one to go away removes the callback to Log all sessions. For this we make callbacks ref-counted inside the map they are stored in. This is done to prevent a deadlock where `active_sessions_mutex_` and `callback_mutex_` are acquired from different threads in a different order. - Create a registration guard to remove callbacks in case `InferenceSession` constructor does not finish. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This PR is inspired by https://github.com/microsoft/onnxruntime/issues/24773?reload=1. Current code exhibits multiple issues. - `EtwRegistrationManager` constructor does not initialize all of the fields including the `InitializationStatus`. - Global callback object is registered and re-created by every session. Customers sometimes run thousands of models in the same sessions which results in a quadratic ETW costs. The callback object is destroyed and recreated every time a session is created. - There is a chance that InferenceSession constructor does not finish, and the callback would remain registered. This may result in intermittent hard to diagnose bugs. - `active_sessions_lock_` and `callback` lock are not acquired/released in the same order by different threads which is a classic deadlock scenario.
Author
Parents
Loading