Use `cub::FutureValue` to simplify 64bit indexing split of cub scan (#66711)
Summary:
https://github.com/NVIDIA/cub/pull/305 has landed to cub 1.15. This is ready to review and land. This PR contains https://github.com/pytorch/pytorch/pull/66219, please land that PR first before review.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66711
Reviewed By: soulitzer
Differential Revision: D32698306
Pulled By: ngimel
fbshipit-source-id: 4cc6b9b24cefd8932f4d421c6d64ea20ea911f52