[torch][segment_reduce] Add cuda support for mean reduction (#59543)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59543
Building on top of previous PR: https://github.com/pytorch/pytorch/pull/59521
This diff is adding support for mean reduction for Cuda (fwd only currently).
Will add cuda backward implementation in subsequent PR.
Next Steps:
cuda backward support for mean
2d data input support
more testing
benchmarking
Test Plan: update unit test to cover this part as well.
Reviewed By: ngimel
Differential Revision: D28922838
fbshipit-source-id: 72b7e5e79db967116b96ad010f290c9f057232d4