`unfold_backward` gets its own kernel (#36612)
Summary:
`unfold_backward` uses `index_add` which causes regression on CUDA because of the underlying `atomicAdd`, and regression on CPU because of limited parallelization. This PR attempts to replace `index_add` with a custom kernel.
Fixes [https://github.com/pytorch/pytorch/issues/17501](https://github.com/pytorch/pytorch/issues/17501).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36612
Differential Revision: D21450349
Pulled By: albanD
fbshipit-source-id: 09ec1fbd5d7290656700eca8e7fb7cf52323ec28