slow_conv2d grad_weight: call gemm directly (#65726)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65726
This PR isn't strictly necessary since grad_weight doesn't use
parallel_for. However, this does reduce the function overhead and will
make it easier to parallelize in the future.
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D31257877
Pulled By: ngimel
fbshipit-source-id: d8ea97cc1f43d8d9dfff355ae27c9d982838b57e