[caffe2][cuda] Trace `allocate` and `local_raw_delete` events with PyTorch USDTs (#107322)
Summary: Adds new tracepoints to CUDA allocator code for tracking alloc and dealloc events in the allocator code.
Test Plan: This change simply adds static tracepoints to CUDA allocator code, and does not otherwise change any logic. Testing is not required.
Reviewed By: chaekit
Differential Revision: D48229150
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107322
Approved by: https://github.com/chaekit