[torch][segment_reduce] Add support for multi-dimensional input (cuda) (#60018)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60018
Same as title. This diff finishes cuda support for currently implemented reductions and input parameters.
Next Steps:
- Add support for sum/min
- More testing and benchmarking
- Cleanup
- Update default values when length is 0
- Use TensorIterator
- Update documentation
Test Plan: Unit test to cover cuda forward path.
Reviewed By: ngimel
Differential Revision: D29135373
fbshipit-source-id: d070727eeb660f56782e7ac8a5b0798be688480a