[CUDACaching Allocator] Release the allocator lock on the slow path (#108367)
Summary: This diff is to release the global allocator lock on the slow path when we do synchronous cudaMalloc call.
Differential Revision: D48750077
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108367
Approved by: https://github.com/zdevito