Migrate uses of THCReduceApplyUtils to cuda_utils::BlockReduce (#64713)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64713
Resubmit of #64442
Test Plan: Imported from OSS
Reviewed By: VitalyFedyunin
Differential Revision: D30825646
Pulled By: ngimel
fbshipit-source-id: 66b06bd0b30b401833e337920681d19d96b11f9d