[add/sub] Cast `alpha` to `acc_type` (#60227)
Summary:
This PR lets `torch.add` & `torch.sub` CUDA kernels cast `alpha` to `acc_type`, not `scalar_t`.
I do not remove `cast`s from `test/test_foreach.py` because I'll do this in https://github.com/pytorch/pytorch/issues/59907 or follow-up for it.
Current upstream `torch._foreach_add` & `torch._foreach_sub` upcast `alpha` parameter to `acc_type<scalar_t>` while `torch.add` & `torch.sub` not. This is kind of problematic because outputs of `torch.add` and `torch.sub` are different from `torch._foreach_add` and `torch._foreach_sub`, respectively if the dtype of input tensors is either `torch.half` or `torch.bfloat16`. The discrepancy is proportional-ish to `abs(alpha)` except when `alpha` is representable with 16 bits.
ref:
- `torch._foreach_add` & `torch._foreach_sub` cast `alpha`: https://github.com/pytorch/pytorch/blob/6d0fb85a623f5ef3f3f1a2afc3660cb71fa70511/aten/src/ATen/native/cuda/ForeachBinaryOpList.cu#L21-L28, `BinaryOpListAlphaFunctor` is defined here: https://github.com/pytorch/pytorch/blob/6d0fb85a623f5ef3f3f1a2afc3660cb71fa70511/aten/src/ATen/native/cuda/ForeachFunctors.cuh#L202
related: https://github.com/pytorch/pytorch/issues/58833, https://github.com/pytorch/pytorch/pull/59907
cc ngimel ptrblck mcarilli
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60227
Reviewed By: mruberry
Differential Revision: D29252759
Pulled By: ngimel
fbshipit-source-id: 847f3b9493ae30a900f7445af00aef1abcc1ab21