Fix bincount to use acc scalar for the bounds (#76979)
The bounds could overflow when the number of bins is larger than the type can use, e.g. when uint8 inputs want 256 bins.
Thank you, Yang Xiaobo, for reporting a reproducing example in the forums.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76979
Approved by: https://github.com/ngimel