DeepSpeed
6fcccfa2 - DeepCompile: Specify tensor aliasing in C++ op schema (#7597)

Commit

91 days ago

DeepCompile: Specify tensor aliasing in C++ op schema (#7597) PyTorch C++ op schema [1] allows specifying tensor storage aliasing by annotating `(a)` after input/output types. Torch inductor takes this information to determine where to insert explicit `del` statements for tensors that are no longer needed. If what an op schema specifies disagrees with the op implementation, inductor-generated code is likely to release tensors earlier than expected and leads to wrong results. `wait_allgather` and `release_param` return the first argument unchanged and that aliasing should be annotated in the schema. Also remove the code related to `clone_custom_op_output` as it is solely a workaround of the aforementioned issue. [1] https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/README.md Fixes: #7596 Signed-off-by: Junjie Mao <junjie.mao@linux.alibaba.com>

References

#7597 - DeepCompile: Specify tensor aliasing in C++ op schema

Author

eternalNight

Parents

47b3fb5e

DeepSpeed 6fcccfa2 - DeepCompile: Specify tensor aliasing in C++ op schema (#7597)

DeepSpeed
6fcccfa2 - DeepCompile: Specify tensor aliasing in C++ op schema (#7597)