Compute cuda reduction buffer size in elements (#63969)
Summary:
Resubmit of https://github.com/pytorch/pytorch/issues/63885
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63969
Reviewed By: mruberry
Differential Revision: D30549423
Pulled By: ngimel
fbshipit-source-id: b16d25030d44ced789c125a333d72b02a8f45067
Author
Natalia Gimelshein