`functional.max_unpool`: OpInfo tests + simpler backward + forward ad + fwad over backward ad
Resolves https://github.com/pytorch/pytorch/issues/67657, https://github.com/pytorch/pytorch/issues/67658, https://github.com/pytorch/pytorch/issues/67660.
These are not necessarily bugs because we cannot produce arbitrary samples coming from `max_pool` to the gradcheck's eternal satisfaction.
This PR also replaces low-level complicated backward kernels with much simpler high-level and well-tested counterparts. The replacement is also faster (before: parallel for loop, after: memory layout optimized TensorIterator's parallelization coming from `gather`).
cc @albanD @mruberry @jbschlosser @walterddr
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68625
Approved by: https://github.com/albanD