[PyTorch][CUDA Caching Allocator] Export sync-stream-and-free-HBM counter in memory_stats for performance debugging (#120050)
Differential Revision: D53734057
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120050
Approved by: https://github.com/xw285cornell