[caffe2] Fix of initializing ATen's CUDA before using caching allocator (#39759)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39759
Caffe2 has a mode where it uses PT's caching allocator. Somehow we were not calling the initialization explicitly.
Now, I have no idea why it worked before. Probably worth to run a bisect separately.
Reviewed By: houseroad
Differential Revision: D21962331
fbshipit-source-id: f16ad6b27a67dbe0bda93939cca8c94620d22a09