[PyTorch] Reduce errors of `foreach` functions (#56993)
Summary:
This is based on https://github.com/pytorch/pytorch/issues/48224.
To make `foreach` more flexible, this PR pushes unsupported cases to slow path.
Also, this adds some tests to verify that
- `foreach` functions work with tensors of different dtypes and/or memory layouts in https://github.com/pytorch/pytorch/commit/7bd4b2c89fad23c17a58969623ea7145833548a1
- `foreach` functions work with tensors on different devices in a list, but are on the same device if the indices are the same: https://github.com/pytorch/pytorch/commit/def4b9b5a19c325bb7f82ef6d69ca28fa2927131
Future plans:
1. Improve the coverage of unittests using `ops` decorator & updating `foreach_unary_op_db` and creating `foreach_(binary|pointwise|minmax)_db`.
2. Support broadcasting in slow path. Ref: https://github.com/pytorch/pytorch/pull/52448
3. Support type promotion in fast path. Ref https://github.com/pytorch/pytorch/pull/52449
CC: ngimel mcarilli ptrblck
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56993
Reviewed By: zou3519
Differential Revision: D28630580
Pulled By: ngimel
fbshipit-source-id: e26ee74a39a591025e18c1ead48948cb7ec53c19