pytorch
2fbe5971 - [pytorch/cuda] apply 16-bit mask to the index for device guard registry (#45485)

Commit View On GitHub

Commit

3 years ago

[pytorch/cuda] apply 16-bit mask to the index for device guard registry (#45485) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45485 Essentially this is the problem reported by ezyang: https://fb.workplace.com/groups/llvm.gcc/permalink/4053565044692080. There are two proposed fixes: * https://github.com/pytorch/pytorch/pull/44883: this doesn't work because it fails some static assert at runtime ``` caffe2/c10/core/TensorOptions.h:553:1: error: static_assert failed due to requirement 'sizeof(c10::TensorOptions) <= sizeof(long) * 2' "TensorOptions must fit in 128-bits" static_assert( sizeof(TensorOptions) <= sizeof(int64_t) * 2, ^ ``` * https://github.com/pytorch/pytorch/pull/44885: to be tested This diff is a temp hack to work around the problem. W/o this patch: ``` volatile size_t device_type = static_cast<size_t>(type); auto p = device_guard_impl_registry[device_type].load(); C10_LOG_FIRST_N(WARNING, 10) << "XDW-fail: " << cntr << ", Device type: " << type << ", type cast: " << device_type << ", guard: " << p; // output XDW-fail: 1129, Device type: cuda, type cast: 65537, guard: 0 ``` Another workaround is D23788441, which changes -O3 to -O2. So this seems to be a miscompilation for nvcc or the host compiler. Reviewed By: ezyang Differential Revision: D23972356 fbshipit-source-id: ab91fbbfccb6389052de216f95cf9a8265445aea

Author

xw285cornell

Committer

facebook-github-bot

Parents

d44eaf63

pytorch 2fbe5971 - [pytorch/cuda] apply 16-bit mask to the index for device guard registry (#45485)

Commit

pytorch
2fbe5971 - [pytorch/cuda] apply 16-bit mask to the index for device guard registry (#45485)