extend gather shape check to handle incorrectly sized outputs (#37102)
Summary:
Fixes a safety issue (Nonsense values and segfaults) introduced by https://github.com/pytorch/pytorch/pull/36875 when in-place gather tries to use incorrect shapes.
Consider the following block of code:
```
k0 = 8
k1 = 8
m = 100
x = torch.rand((k0, k1))
ind = torch.randint(0, k0, (m, k1))
output = torch.empty((m, k1))
print(torch.gather(x, 0, ind, out=output))
print(torch.gather(x, 1, ind, out=output))
```
The first gather is legal, the second is not. (`ind` and `output` need to be transposed) Previously this was caught when the kernel tried to restride inputs for TensorIterator, but we can no longer rely on those checks and must test explicitly. If `m` is small the second gather returns gibberish; if it is large enough to push the read out of memory block the program segfaults.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37102
Differential Revision: D21190580
Pulled By: robieta
fbshipit-source-id: 80175620d24ad3380d78995f7ec7dbf2627d2998