(#23574)
Summary:
Assert that there's no multiple written-to to a single memory location, which
caused corrupted output.
Fixed batched matrix trlu logic, which relies on the previous copy behavior to
support tensors with stride 0 at leading dimension.
This fixes the issue proposed at: https://github.com/pytorch/pytorch/issues/23063
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23574
Differential Revision: D16600717
Pulled By: ezyang
fbshipit-source-id: e41e14f03eccf97398b64ba43647110beb1529e6