Fix SIGSEGV in CudaIPCTypes.cpp. (#53080)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53080
As described in https://github.com/pytorch/pytorch/issues/51619,
ProcessGroupShareTensorTest was failing due to segfaults in CudaIPCTypes.cpp.
There were two issues that had to be fixed for this:
1. The ref_counter_files_ map was looked up and the result was used without
checking whether or not the appropriate key existed in the map. This would
result in default construction in the map if the key didn't exist resulting in
a nullptr being stored in the map.
2. ~CudaIPCSentData uses the global cuda_ipc_global_entities variable. But as
part of destroying cuda_ipc_global_entities, ~CudaIPCSentData is called which
accesses an already destroyed cuda_ipc_global_entities. This is now avoided by
clearing all shared blocks in ~CudaIPCGlobalEntities to ensure they are all
cleaned up before the destructor exits.
#Closes: https://github.com/pytorch/pytorch/issues/51619
ghstack-source-id: 122812319
Test Plan: Run `python test/distributed/test_c10d_spawn.py -v ProcessGroupShareTensorTest`
Reviewed By: VitalyFedyunin
Differential Revision: D26742332
fbshipit-source-id: 6de4c4533f5bca673e6e171af32d034bd6ade5bb