avoid CPU std::copysign segfault when compiling on arm64 with gcc 7.5 / 8 for CUDA (#51834)
Summary:
It seems that the std::copysign code introduced in https://github.com/pytorch/pytorch/issues/51706 is too much for gcc 7.5 / 8 when compiled on arm64 (e.g. on Jetson with latest Jetpack) and causes it to produce an internal compiler error with segfault during compilation. This avoids the compiler bug it by not using std::copysign.
A very kind person sent a Jetson Xavier NX {emoji:1f381} thank you {emoji:2764}.
After https://github.com/pytorch/pytorch/issues/51900 fixed this for CPU-only arm64 (eg Raspberry), this fixes it for CUDA-using arm64 (e.g. Jetson). CUDA device lambdas must also be present as host functions for technical reasons but they are never used, so we just assert in the CPU variant instead of actually doing the operation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51834
Reviewed By: mrshenli
Differential Revision: D27622277
Pulled By: malfet
fbshipit-source-id: a1dc4c3a67f925019782e24b796919e17339749f