Migrate frac from TH to ATen (CUDA) (#28953)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28953
Close #24566
Benchmark (Debian Buster, CUDA 9.2, Quadro P400, turbo off, Release, gcc
7.4):
```python
import timeit
for n, t in [(10_000, 20000),
(100_000, 20000)]:
for dtype in ('torch.half', 'torch.float', 'torch.double'):
print(f'torch.frac(a) a.numel() == {n} for {t} times {dtype}')
print(timeit.timeit(f'torch.frac(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t))
```
Before:
```
torch.frac(a) a.numel() == 10000 for 20000 times torch.half
0.3608182370007853
torch.frac(a) a.numel() == 10000 for 20000 times torch.float
0.3647012189976522
torch.frac(a) a.numel() == 10000 for 20000 times torch.double
0.3889585220022127
torch.frac(a) a.numel() == 100000 for 20000 times torch.half
0.622635444997286
torch.frac(a) a.numel() == 100000 for 20000 times torch.float
0.9595754649999435
torch.frac(a) a.numel() == 100000 for 20000 times torch.double
1.5590267750012572
```
After:
```
torch.frac(a) a.numel() == 10000 for 20000 times torch.half
0.3675256470014574
torch.frac(a) a.numel() == 10000 for 20000 times torch.float
0.3703597319981782
torch.frac(a) a.numel() == 10000 for 20000 times torch.double
0.372184894993552
torch.frac(a) a.numel() == 100000 for 20000 times torch.half
0.60767333900003
torch.frac(a) a.numel() == 100000 for 20000 times torch.float
0.9645607889979146
torch.frac(a) a.numel() == 100000 for 20000 times torch.double
1.5542530329985311
```
Test Plan: Imported from OSS
Differential Revision: D18302768
Pulled By: VitalyFedyunin
fbshipit-source-id: 24198838dc903d455155f0819d0c7d58974aaecd