Port `equal` from THC to ATen (CUDA) (#36483)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/24557
ASV benchmark:
```
import torch
sizes = [
(10**6,),
(1000, 1000),
(10, 10),
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
]
class EqualTrue:
params = range(len(sizes))
def setup(self, n):
dims = sizes[n]
self.a = torch.rand(dims, device='cuda')
self.b = self.a.clone()
def time_equal(self, n):
torch.equal(self.a, self.b)
class EqualFalse:
params = range(len(sizes))
def setup(self, n):
dims = sizes[n]
self.a = torch.rand(dims, device='cuda')
self.b = torch.rand(dims, device='cuda')
def time_equal(self, n):
torch.equal(self.a, self.b)
```
Old results:
```
[ 75.00%] ··· equal.EqualFalse.time_equal
[ 75.00%] ··· ======== ============
param1
-------- ------------
0 67.7±7μs
1 74.0±2μs
2 24.4±0.1μs
3 135±0.2μs
======== ============
[100.00%] ··· equal.EqualTrue.time_equal
[100.00%] ··· ======== ============
param1
-------- ------------
0 59.8±0.2μs
1 59.9±0.3μs
2 25.0±0.5μs
3 136±0.2μs
======== ============
```
New results:
```
[ 75.00%] ··· equal.EqualFalse.time_equal
[ 75.00%] ··· ======== ============
param1
-------- ------------
0 44.4±0.2μs
1 44.5±0.4μs
2 31.3±0.3μs
3 96.6±0.5μs
======== ============
[100.00%] ··· equal.EqualTrue.time_equal
[100.00%] ··· ======== ============
param1
-------- ------------
0 44.2±0.2μs
1 44.6±0.2μs
2 30.8±0.3μs
3 97.3±0.2μs
======== ============
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36483
Differential Revision: D21451829
Pulled By: VitalyFedyunin
fbshipit-source-id: 033e8060192c54f139310aeafe8ba784bab94ded