Migrate `cosh` and `cosh_` from TH to ATen (CUDA) (#36654)
Summary:
Closes https://github.com/pytorch/pytorch/issues/24546
Benchmark with same build settings on same system.
gcc : version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)
CUDA : 10.1
GPU : 1050ti
```python
import timeit
for n, t in [(10_000, 20000),
(100_000, 20000)]:
for dtype in ('torch.half', 'torch.float', 'torch.double'):
print(f'torch.cosh(a) a.numel() == {n} for {t} times {dtype}')
print(timeit.timeit(f'torch.cosh(a); torch.cuda.synchronize()',
setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")',
number=t))
```
Before:
```
torch.cosh(a) a.numel() == 10000 for 20000 times torch.half
0.2813017509997735
torch.cosh(a) a.numel() == 10000 for 20000 times torch.float
0.28355878599904827
torch.cosh(a) a.numel() == 10000 for 20000 times torch.double
0.27810572300040803
torch.cosh(a) a.numel() == 100000 for 20000 times torch.half
0.3239932899996347
torch.cosh(a) a.numel() == 100000 for 20000 times torch.float
0.321233343998756
torch.cosh(a) a.numel() == 100000 for 20000 times torch.double
0.5546665399997437
```
After:
```
torch.cosh(a) a.numel() == 10000 for 20000 times torch.half
0.2905335750001541
torch.cosh(a) a.numel() == 10000 for 20000 times torch.float
0.27596429500044906
torch.cosh(a) a.numel() == 10000 for 20000 times torch.double
0.30358699899989006
torch.cosh(a) a.numel() == 100000 for 20000 times torch.half
0.30139567500009434
torch.cosh(a) a.numel() == 100000 for 20000 times torch.float
0.30246640400036995
torch.cosh(a) a.numel() == 100000 for 20000 times torch.double
0.5403946970000106
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36654
Differential Revision: D21164606
Pulled By: VitalyFedyunin
fbshipit-source-id: 55e88f94044957f81599ae3c12cda38a3e2c985c