Handle aliases correctly in foreach (#119508)
Fixes https://github.com/pytorch/pytorch/issues/119436
<s>In essence we need to ensure aliases are run in separate foreach kernels so that they are ordered correctly. Previously, aliases could end up in the same kernel which creates weird scheduling dependencies.</s>
There was a bug in cycle detection/can_fuse which was creating cycles when more than two aliases were used in foreach nodes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119508
Approved by: https://github.com/jansel