pytorch
609cde94 - DTensor: use memory_format in the hash for all aten ops that use that arg (e.g. aten.clone) (#118667)

Commit View On GitHub

Commit

214 days ago

DTensor: use memory_format in the hash for all aten ops that use that arg (e.g. aten.clone) (#118667) This fixes an internal DTensor enablement bug (I don't have an OSS issue for it) I finally root-caused this as follows: (1) we were fakefying a DTensor graph input, that was an autograd non-leaf (it had a grad_fn) (2) that caused it do go through this `clone()` call during fakeification: https://github.com/pytorch/pytorch/blob/main/torch/_subclasses/meta_utils.py#L549 (3) `clone(torch.preserve_format)` is supposed to return another DTensor with the same strides as the input, but I noticed we were returning a DTensor with contiguous strides incorrectly. (4) It turns out that DTensor was hashing on the sharding strategy for `aten.clone`, regardless of the `memory_format` kwarg that was passed in. I could have manually updated the `clone` sharding strategy registration to take `memory_format` into account. But instead, I figured that every aten op with a sharding strategy needs to handle the memory_format kwarg specially - so I tried to generically force DTensor to consider all ATen ops that take a `memory_format` kwarg during hashing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/118667 Approved by: https://github.com/wanchaol ghstack dependencies: #117667, #117666, #118209, #118191

Author

bdhirsh

Committer

pytorchmergebot

Parents

6819452a

pytorch 609cde94 - DTensor: use memory_format in the hash for all aten ops that use that arg (e.g. aten.clone) (#118667)

Commit

pytorch
609cde94 - DTensor: use memory_format in the hash for all aten ops that use that arg (e.g. aten.clone) (#118667)