Migrate conv2d from TH to ATen (CPU) (#28793)
Summary:
This is a port of the SpatialConvolutionMM TH (CPU) implementation to ATen as `slow_conv2d`. In practice it is invoked for ungrouped, non-dilated, non-float32 convolutions (e.g. float64, long, bfloat16).
- [x] unfolded_copy & unfolded_acc
- [x] forward
- [x] backward
- [x] basic sanity cross check with 1.3 impl
- [x] systematic testing
- [x] performance comparison & optimization
File used for performance testing: [benchmark_conv2d.py](https://gist.github.com/andreaskoepf/c2777b2e5e9d11610f9fc74372930527)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28793
Differential Revision: D18256451
Pulled By: ezyang
fbshipit-source-id: d09e84eef11ccf8a6178dfad485fe6fd0ddf0c86