[caffe2] fix atomicAdd redeclaration Clang error (#33559)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33559
For sm_60+ CUDA supports `atomicAdd(double*, double*)` function and for lower compute capabilities the CUDA C Programming Guide [1] suggest a user implementation as in this code. On the other side, Clang's CUDA wrappers unconditionally define this function, regardless of compute capability, and merit an error if it actually get's used.
So the problem is: when Clang is used for < sm_60, CUDA's `atomicAdd(double*, double*)` cannot be used and it cannot be redeclared in standard compliant C++.
Workaround the problem by using Clang's `enable_if` attribute [2], which has a side effect of function redeclaration.
1. https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#atomic-functions
2. https://clang.llvm.org/docs/AttributeReference.html#enable-if
Test Plan:
```lang=bash
buck build mode/opt -c fbcode.cuda_use_clang=true //fblearner/flow/projects/dper:workflow
buck build mode/opt //fblearner/flow/projects/dper:workflow
```
Execute tests on devgpu:
```
buck test mode/dev-nosan -j 8 //caffe2/caffe2/python/operator_test/... //caffe2/test:cuda
```
Reviewed By: ngimel
Differential Revision: D20005113
fbshipit-source-id: d0d4bd6514f201af9cdeba1229bd9b798df0d02e