[test_foreach] add cases of zero size tensors (#95028)
supply zero-size tensors only if multi_tensor_apply_kernel would be called w.h.p, i.e. device is cuda and dtype is float32
rel:
- https://github.com/pytorch/pytorch/pull/94655
- https://github.com/pytorch/pytorch/issues/94865
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95028
Approved by: https://github.com/ngimel