pytorch
8533a485 - Fix SIGSEGV in CudaIPCTypes.cpp. (#53080)

Commit
3 years ago
Fix SIGSEGV in CudaIPCTypes.cpp. (#53080) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53080 As described in https://github.com/pytorch/pytorch/issues/51619, ProcessGroupShareTensorTest was failing due to segfaults in CudaIPCTypes.cpp. There were two issues that had to be fixed for this: 1. The ref_counter_files_ map was looked up and the result was used without checking whether or not the appropriate key existed in the map. This would result in default construction in the map if the key didn't exist resulting in a nullptr being stored in the map. 2. ~CudaIPCSentData uses the global cuda_ipc_global_entities variable. But as part of destroying cuda_ipc_global_entities, ~CudaIPCSentData is called which accesses an already destroyed cuda_ipc_global_entities. This is now avoided by clearing all shared blocks in ~CudaIPCGlobalEntities to ensure they are all cleaned up before the destructor exits. #Closes: https://github.com/pytorch/pytorch/issues/51619 ghstack-source-id: 122812319 Test Plan: Run `python test/distributed/test_c10d_spawn.py -v ProcessGroupShareTensorTest` Reviewed By: VitalyFedyunin Differential Revision: D26742332 fbshipit-source-id: 6de4c4533f5bca673e6e171af32d034bd6ade5bb
Author
Parents
Loading