Faster gc_count update for CUDACachingAllocator (#108071)
Summary: Modify the way we update gc_count in CUDACachingAlloctor to make it faster.
Reviewed By: jaewonlee-fb
Differential Revision: D48481557
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108071
Approved by: https://github.com/zdevito