WelfordOps: Remove combine_t and use acc_scalar_t instead (#94522)
`combine_t` is the type used to represent the number of elements seen so far as
a floating point value (acc.nf). It is always used in calculations with other
values of type `acc_scalar_t` so there is no performance gained by making this a
separate template argument. Furthermore, when calculating the variance on CUDA
it is always set to `float` which means values are unnecessarily truncated
before being immediately promoted to `double`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94522
Approved by: https://github.com/ngimel