Move legacy TH functions(sinh,cosh) to TensorIterator + Vec256 (#21115)
Summary:
This is a follow up on Jame's PR: https://github.com/pytorch/pytorch/pull/19041. The idea is to replace the legacy `sinh` / `cosh` ops that are being dispatched to TH with the operations defined in `Vec256` for better performance.
benchmark(from Jame's script):
```python
import torch, time
ops = ['sinh', 'cosh']
x = torch.rand(1024, 1024)
NITER = 10000
print('op', 'time per iter (ms)', 'gops/s', 'GB/s', sep='\t')
for op in ops:
s = time.time()
for i in range(NITER):
getattr(x, op)()
elapsed_sec = ((time.time() - s) / NITER)
print(op, elapsed_sec * 1000, (1024*1024/elapsed_sec)/1e9, (1024*1024*4*2) / elapsed_sec / 1e9, sep='\t')
```
code on master:
```
op time per iter (ms) gops/s GB/s
sinh 3.37614369392395 0.3105839369002935 2.484671495202348
cosh 3.480502033233643 0.3012714803748572 2.4101718429988574
```
after change (on Macbook pro 2018):
```
op time per iter (ms) gops/s GB/s
sinh 0.8956503868103027 1.1707425301677301 9.365940241341841
cosh 0.9392147302627564 1.1164390487217428 8.931512389773943
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21115
Reviewed By: ljk53
Differential Revision: D15574580
Pulled By: xta0
fbshipit-source-id: 392546a0df11ed4f0945f2bc84bf5dea2750b60e