Cuda Profiler (#7110)
* implement cuda profiler
* add counters
* downgrade cupti kernel version
* move mutex
* add cupti to path
* fix win gpu build err
* add path for cuda10
* fix linux com err
* extend include path
* add init flag
* fix test case
* fix tensorrt pipeline
* add UT
Co-authored-by: Ubuntu <randysheriff@rashuai-linux-gpu-3.3cfnmjowvu4e5bidlsmcxsmzwg.xx.internal.cloudapp.net>