[MPS] Use Metal Events to synchronize buffers in MPSAllocator (Part 1) (#106938)
- This PR is the first part of a bigger change to use `MPSEvent` to synchronize shared-buffers between CPU/GPU.
- Add APIs to record and wait for `MPSEvents` in `MPSAllocator`.
- Use a container list for Buffer Pools to simplify iterating over them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106938
Approved by: https://github.com/kulinseth