Fix ellipsis behavior for `Tensor.align_to` to glob all missing dims (#26648)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26648
Previously:
- `Tensor.align_to(*names)` only works on fully named tensors. In addition, the
desired ordering `names` must not have any None-names.
- `Tensor.align_to(*names)` accepted `...`, but expanded it based on
position. i.e., in `tensor.align_to('N', ..., 'C', 'H')`, `...` expand
to `*tensor.names[1:-2]`. This is wildly incorrect: see the following
concrete example.
```
tensor = tensor.refine_names('N', 'C', 'H, 'W')
tensor.align_to('W', ...) # ... expands to 'C', 'H, 'W'
```
This PR changes it so that `...` in `tensor.align_to` grabs all
unmentioned dimensions from `tensor`, in the order that they appear.
`align_to` is the only function that takes ellipsis that requires this
change. This is because all other functions (`refine_to`) require their
list of names to work in a positional manner, but `align_to` lets the
user reorder dimensions.
This does not add very much overhead to `align_to`, as shown in the
following benchmark. However, in the future, we should resolve to make
these operations faster; align_to should be as fast as view but isn't
most likely due to Python overhead.
```
[ins] In [2]: import torch
...: named = torch.randn(3, 3, 3, 3, names=('N', 'C', 'H', 'W'))
...: unnamed = torch.randn(3, 3, 3, 3)
...: %timeit unnamed[:]
...: %timeit unnamed.view(-1)
...: %timeit named.align_to(...)
...: %timeit named.align_to('N', 'C', 'H', 'W')
31 µs ± 126 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
43.8 µs ± 146 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
69.6 µs ± 142 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
66.1 µs ± 1.13 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
```
Test Plan:
- new tests [namedtensor ci]
allows the user to transpose and permute dimensions.
Differential Revision: D17528207
Pulled By: zou3519
fbshipit-source-id: 4efc70329f84058c245202d0b267d0bc5ce42069