copy_: Short-circuit when self and src view the same data (#88884)
This comes up if you use inplace operators on a slice, e.g.
```python
import torch
a = torch.rand(1000000, device="cuda")
a[::2] *= 2
```
The last line looks as if it should be fully inplace, but is actually
equivalent to:
```python
tmp = a[::2]
tmp *= 2
a[::2] = tmp
```
Which results in `mul_` and `copy_` being called. With this PR, the
redundant copy becomes a no-op and the above example is 2x faster.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88884
Approved by: https://github.com/ngimel