use non-overflowing divide in cuda kernel util GET_BLOCKS (#44391)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/43476.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44391
Reviewed By: mrshenli
Differential Revision: D23602424
Pulled By: walterddr
fbshipit-source-id: 40ed81547f933194ce5bf4a5bcebdb3434298bc1