ATen port of lgamma (cuda) (#26600)
Summary:
Resolves issue https://github.com/pytorch/pytorch/issues/24585 .
Btw, there are two ways to define unary operator support:
1. Use `IMPLEMENT_UNARY_OP_VEC_CUDA(someunaryop)` in `aten/src/ATen/UnaryOps.cpp` and in `native_functions.yaml` have:
```
- func: someunaryop(Tensor self) -> Tensor
use_c10_dispatcher: full
supports_named_tensor: True
variants: method, function
dispatch:
CPU: someunaryop
CUDA: someunaryop
```
2. Or, in `aten/src/ATen/UnaryOps.cpp` have
```
Tensor& someunaryop_out(Tensor& result, const Tensor& self) { return unary_op_impl_out(result, self, someunaryop_stub); }
Tensor someunaryop(const Tensor& self) { return unary_op_impl(self, someunaryop_out); }
Tensor& someunaryop_(Tensor& self) { return unary_op_impl_(self, someunaryop_out); }
```
and in `native_functions.yaml` (note that `dispatch` section is removed):
```
- func: someunaryop(Tensor self) -> Tensor
use_c10_dispatcher: full
supports_named_tensor: True
variants: method, function
```
It turns out that the way 1 is 3% more performant than the way 2.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26600
Differential Revision: D17527166
Pulled By: ezyang
fbshipit-source-id: 112ba298ad3f67d04078b921859e73dcd184852b