DTensor: use memory_format in the hash for all aten ops that use that arg (e.g. aten.clone) (#118667)
This fixes an internal DTensor enablement bug (I don't have an OSS issue for it)
I finally root-caused this as follows:
(1) we were fakefying a DTensor graph input, that was an autograd non-leaf (it had a grad_fn)
(2) that caused it do go through this `clone()` call during fakeification: https://github.com/pytorch/pytorch/blob/main/torch/_subclasses/meta_utils.py#L549
(3) `clone(torch.preserve_format)` is supposed to return another DTensor with the same strides as the input, but I noticed we were returning a DTensor with contiguous strides incorrectly.
(4) It turns out that DTensor was hashing on the sharding strategy for `aten.clone`, regardless of the `memory_format` kwarg that was passed in.
I could have manually updated the `clone` sharding strategy registration to take `memory_format` into account. But instead, I figured that every aten op with a sharding strategy needs to handle the memory_format kwarg specially - so I tried to generically force DTensor to consider all ATen ops that take a `memory_format` kwarg during hashing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118667
Approved by: https://github.com/wanchaol
ghstack dependencies: #117667, #117666, #118209, #118191