Stops cross-device data movement in tensor iterator (#38998)
Summary:
**BC-breaking note:**
In previous versions of PyTorch zero dimensional CUDA tensors could be moved across devices implicitly. For example,
```
torch.tensor(5, device='cuda:0') + torch.tensor((1, 1), device='cuda:1')
```
would work, even though the tensors are on different CUDA devices. This is a frequent source of user confusion, however, and PyTorch generally does not move data across devices without it being explicit. This functionality is removed in PyTorch 1.6.
**PR Summary:**
Today in PyTorch we allow implicit data movement of zero dimensional CUDA tensors. For example, we allow:
```
torch.tensor(5, device='cuda:0') + torch.tensor((1, 1), device='cuda:1')
```
and
```
torch.tensor(2, device='cuda') + torch.tensor((3, 5))
```
In both of these cases TensorIterator would move the zero dim CUDA tensor to the device of the non-scalar tensor (cuda:1 in the first snippet, the CPU in the second snippet).
One of PyTorch's fundamental rules, however, is that it does not perform implicit data movement like this, and this change will causes these cases to throw an error. New tests for this behavior are added to test_torch.py, and tests of the old behavior are removed in test_torch.py and test_autograd.py. A cpp test in tensor_iterator_test.cpp is modified to account for the new behavior.
This addresses https://github.com/pytorch/pytorch/issues/36722.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38998
Differential Revision: D21757617
Pulled By: mruberry
fbshipit-source-id: 2498f07f4938d6de691fdbd5155ad2e881ff7fdb