[MPS] Add support for MPSProfiler (#100635)
- Enable event and interval-based os signpost tracing via env-var 'PYTORCH_MPS_TRACE_SIGNPOSTS' (python bindings sent in separate PR).
- Enable logging of MPS graphs, native kernels, and copies and their GPU times via env-var `PYTORCH_MPS_LOG_PROFILE_INFO`.
- Enable dumping the table of kernel profiling results sorted based on Mean GPU time when the process ends (SIGINT also handled).
- Fix a bug in MPSAllocator where the Allocator completionHandlers were called after MPSAllocator instance was destroyed.
- Added option to use Schedule Handlers to begin signpost intervals.
- Refer to comments in `MPSProfiler.h` to learn how to set env-vars for logging and signpost tracing. Proper documentation will be sent in a separate PR later.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100635
Approved by: https://github.com/kulinseth