Convert num_kernels to int64 before calling into CUDA GET_BLOCKS (#44688)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44688
this fixes https://github.com/pytorch/pytorch/issues/44472
Test Plan: Imported from OSS
Reviewed By: walterddr
Differential Revision: D23699819
Pulled By: soulitzer
fbshipit-source-id: 7ecfe78d09344178d1e6c7e1503417feb6beff6c