Replace `dim_apply` with `TensorIterator` (#58656)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58656
Ref gh-56794
`dim_apply` is problematic because it calls `Tensor.select` inside of a parallel
region. Instead, replace it with `TensorIterator` by squashing the
apply-dimension. This is similar to the `_dim_apply` function already used by
the sort kernels:
https://github.com/pytorch/pytorch/blob/8c91acc161a742f9f36ab37a55ec46767404b544/aten/src/ATen/native/cpu/SortingKernel.cpp#L27
Test Plan: Imported from OSS
Reviewed By: mruberry
Differential Revision: D28776441
Pulled By: ngimel
fbshipit-source-id: 14449d4b12ed4576f879bb65a35e881ce1a953b1