pytorch
c8bc298d - streamline stride propagation logic in TensorIterator (#42922)

Commit

5 years ago

streamline stride propagation logic in TensorIterator (#42922) Summary: Fixes https://github.com/pytorch/pytorch/issues/41314 among other things. This PR streamlines layout propagation logic in TensorIterator and removes almost all cases of channels-last hardcoding. The new rules and changes are as follows: 1) behavior of undefined `output` and defined output of the wrong (e.g. 0) size is always the same (before this PR the behavior was divergent) 2) in obvious cases (unary operation on memory-dense tensors, binary operations on memory-dense tensors with the same layout) strides are propagated (before propagation was inconsistent) (see footnote) 3) in other cases the output permutation is obtained as inverse permutation of sorting inputs by strides. Sorting is done with comparator obeying the following rules: strides of broadcasted dimensions are set to 0, and 0 compares equal to anything. Strides of not-broadcasted dimensions (including dimensions of size `1`) participate in sorting. Precedence is given to the first input, in case of a tie in the first input, first the corresponding dimensions are considered, and if that does not indicate that swap is needed, strides of the same dimension in subsequent inputs are considered. See changes in `reorder_dimensions` and `compute_strides`. Note that first inspecting dimensions of the first input allows us to better recover it's permutation (and we select this behavior because it more reliably propagates channels-last strides) but in some rare cases could result in worse traversal order for the second tensor. These rules are enough to recover previously hard-coded behavior related to channels last, so all existing tests are passing. In general, these rules will produce intuitive results, and in most cases permutation of the full size input (in case of broadcasted operation) will be recovered, or permutation of the first input (in case of same sized inputs) will be recovered, including cases with trivial (1) dimensions. As an example of the latter, the following tensor ``` x=torch.randn(2,1,3).permute(1,0,2) ``` will produce output with the same stride (3,3,1) in binary operations with 1d tensor. Another example is a tensor of size N1H1 that has strides `H,H,1,1` when contiguous and `H, 1, 1, 1` when channels-last. The output retains these strides in binary operations when another 1d tensor is broadcasted on this one. Footnote: for ambiguous cases where all inputs are memory dense and have the same physical layout that nevertheless can correspond to different permutations, such as e.g. NC11-sized physically contiguous tensors, regular contiguous tensor is returned, and thus permutation information of the input is lost (so for NC11 channels-last input had the strides `C, 1, C, C`, but output will have the strides `C, 1, 1, 1`). This behavior is unchanged from before and consistent with numpy, but it still makes sense to change it. The blocker for doing it currently is performance of `empty_strided`. Once we make it on par with `empty` we should be able to propagate layouts in these cases. For now, to not slow down common contiguous case, we default to contiguous. The table below shows how in some cases current behavior loses permutation/stride information, whereas new behavior propagates permutation. | code | old | new | |------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------|------------------------------------------------------| | #strided tensors a=torch.randn(2,3,8)[:,:,::2].permute(2,0,1) print(a.stride()) print(a.exp().stride()) print((a+a).stride()) out = torch.empty(0) torch.add(a,a,out=out) print(out.stride()) | (2, 24, 8) (6, 3, 1) (1, 12, 4) (6, 3, 1) | (2, 24, 8) (1, 12, 4) (1, 12, 4) (1, 12, 4) | | #memory dense tensors a=torch.randn(3,1,1).as_strided((3,1,1), (1,3,3)) print(a.stride(), (a+torch.randn(1)).stride()) a=torch.randn(2,3,4).permute(2,0,1) print(a.stride()) print(a.exp().stride()) print((a+a).stride()) out = torch.empty(0) torch.add(a,a,out=out) print(out.stride()) | (1, 3, 3) (1, 1, 1) (1, 12, 4) (6, 3, 1) (1, 12, 4) (6, 3, 1) | (1, 3, 3) (1, 3, 3) (1, 12, 4) (1, 12, 4) (1, 12, 4) (1, 12, 4) | Pull Request resolved: https://github.com/pytorch/pytorch/pull/42922 Reviewed By: ezyang Differential Revision: D23148204 Pulled By: ngimel fbshipit-source-id: 670fb6188c7288e506e5ee488a0e11efc8442d1f

Author

Natalia Gimelshein

Committer

facebook-github-bot

Parents

ca9d4401

pytorch c8bc298d - streamline stride propagation logic in TensorIterator (#42922)

pytorch
c8bc298d - streamline stride propagation logic in TensorIterator (#42922)