CUDA trace Python hooks (#82824)
### Description
This adds Python hooks into PyTorch that allow the user to register their own callbacks for events such as tensor allocation, stream allocation, event record / wait etc.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82824
Approved by: https://github.com/lw, https://github.com/ezyang, https://github.com/malfet