[torch] Add cuda support for segment reduction 'max' (#56704)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56704
This is re submit of PR: https://github.com/pytorch/pytorch/pull/54175
Main changes compared to original PR:
- Switch to importing "<ATen/cuda/cub.cuh>"
- Use CUB_WRAPPER to reduce boiler plate code.
Test Plan:
Will check CI status to make sure a
Added unit test
Reviewed By: ngimel
Differential Revision: D27941257
fbshipit-source-id: 24a0e0c7f6c46126d2606fe42ed03dca15684415