Thread pool profiler (#6748)
* add profiler
* add thread id
* refactoring
* switch to vector
* add override keyword
* fix comments
* renaming
* add revoke time
* restore statics
* restore enable flag
* fix end error
* fix comments
* add comment
* add comments
* make profiler thread-safe
* switch to shared_lock
* switch to shared_timed_mutex
* switch to OrtMutex
* add per child thread counters
* switch to vector
* refactor LogCore
* fix comments
* cancel spin and block counter to reduce overhead
* fix minor format issue
Co-authored-by: Randy Shuai <rashuai@microsoft.com>