unfold_backward: Remove stride >= size kernel in favour of copy_ (#88061)
unfold_backward has a dedicated kernel for `stride >= size` which uses temporary
tensors created by `at::arange` to perform the mapping from unfolded to folded.
This instead uses `unfold` to view the output, and does a direct copy from the
gradient into the view.
In benchmarks I see either no difference or a marginal speed benefit from
this PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88061
Approved by: https://github.com/albanD