Fix convolution_depthwise3x3_winograd for multichannel output (#60460)
Summary:
Before this change it was implemented with the assumption, that number of groups, input and output channels are the same, which is not always the case
Extend the implementation to support any number of output channels as long as number of groups equals to the number of input channels (i.e. kernel.size(1) == 1)
Fixes https://github.com/pytorch/pytorch/issues/60176
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60460
Reviewed By: albanD
Differential Revision: D29299693
Pulled By: malfet
fbshipit-source-id: 31130c71ce86535ccfba2f4929eee3e2e287b2f0