Switch reduce_scatter and all_gather in DeviceMesh to use functional collectives (#96226)
Among the changes is the introduction of gather_dim and scatter_dim in DeviceMesh collectives to simplify user code.
The current plan is to keep padding and gather/scatter dim support in DeviceMesh while we explore optimization opportunities in Inductor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96226
Approved by: https://github.com/wanchaol