DeepCompile: Specify tensor aliasing in C++ op schema (#7597)
PyTorch C++ op schema [1] allows specifying tensor storage aliasing by
annotating `(a)` after input/output types. Torch inductor takes this
information to determine where to insert explicit `del` statements for
tensors that are no longer needed.
If what an op schema specifies disagrees with the op implementation,
inductor-generated code is likely to release tensors earlier than
expected and leads to wrong results.
`wait_allgather` and `release_param` return the first argument unchanged
and that aliasing should be annotated in the schema.
Also remove the code related to `clone_custom_op_output` as it is solely
a workaround of the aforementioned issue.
[1]
https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/README.md
Fixes: #7596
Signed-off-by: Junjie Mao <junjie.mao@linux.alibaba.com>