TensorImpl.h: adapt to clang 12 (#70973)

Commit

2 years ago

caffe2/c10/core/TensorImpl.h: adapt to clang 12 (#70973) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70973 clang12 builds fail like this: caffe2/c10/core/TensorImpl.h:2615:1: error: static_assert failed due to requirement 'sizeof(void *) != sizeof(long) || sizeof(c10::TensorImpl) == sizeof(long) * 24' "You changed the size of TensorImpl on 64-bit arch.See Note [TensorImpl size constraints] on how to proceed." Yet eliciting the size of that struct with this one-line addition: char (*__show_sizeof)[sizeof( TensorImpl )] = 1; reports that its size is indeed 192 (aka 8 * 24): caffe2/c10/core/TensorImpl.h:2615:8: error: cannot initialize a variable of type 'char (*)[192]' with an rvalue of type 'int' On closer inspection we determined that failures were occurring because TensorImpl was sometimes of size 208 and other times of size 192. The 192 size was expected and TensorImpl was hard-coded to raise an error for any other case on a 64-bit system, including the one we found where the size was 208. Additional investigation revealed that systems using GCC 11 and CUDA 11040 with either C++ 201402 and 201703 would sometimes yield TensorImpl sizes of 208 whereas systems newer systems without CUDA would always yield sizes of 192. The difference turned out to be that `std::unique_ptr` on NVCC systems is sometimes of size 16 and other times of size 8, accounting fully for the observed difference in TensorImpl sizes. We have not yet been able to find a set of preprocessor macros that predict when each size will occur. To handle the situation, we've added extensive debugging information to the TensorImpl size-checking logic. A number of preprocessing definitions capture compiler versions and other information to help understand what changes might have affected the size of TensorImpl. The size of each member of TensorImpl is now individually checked, along with the total size. Template-based comparison functions are used to provide compile-time outputs about the system state as well as the observed and expected sizes of each item considered. The template-based comparison functions cause the code to break if it's run on a 32-bit system because the templates and their associated static_asserts are compiled whether or not they'll ultimately be used. In C++17 we could prevent this using `if constexpr`; however, PyTorch is pinned to C++14, so we cannot. Instead, we check pointer size (`#if UINTPTR_MAX == 0xFFFFFFFF`) to determine which system we're on and provide separate checks for 32 vs 64-bit systems. A final wrinkle is that 32-bit systems have some variations in data size as well. We handle these by checking that the relevant items are `<=` the expected values. In summary... Improvements over the previous situation: * Added checks for 32-bit systems * The sizes of individual fields are now checked * Compile-time size results (expected versus observed) are provided * Compile-time compiler and system info is provided * Landing this diff will actually enable checks of TensorImpl size; they are currently disabled to expedite LLVM-12 + newer CUDA upgrade efforts. Some work that could still be done: * Figure out what preprocessor flags (if any) predict the size of `std::unique_ptr` for 64-bit systems and of various elements of 32-bit systems. Test Plan: Building no longer triggers that static_assert failure. Reviewed By: luciang Differential Revision: D32749655 fbshipit-source-id: 481f84da6ff61b876a5aaba89b8589ec54d59fbe

References

#71455 - Merge `master` to `lazy_tensor_staging` - 01/18/22

Author

r-barnes

Committer

facebook-github-bot

Parents

385773cb

pytorch 6c1be299 - caffe2/c10/core/TensorImpl.h: adapt to clang 12 (#70973)

pytorch
6c1be299 - caffe2/c10/core/TensorImpl.h: adapt to clang 12 (#70973)