Added partial decomposition of conv_backward and grad_bias computation (#89128)
`convolution_backward` often just kicks off the `sum` as a separate kernel. Splitting it off in a decomp allows us to fuse it into other ops: https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/Convolution.cpp#L2150
Improves `convnext_base` from 373 img/s => 383 img/s
Not sure what other models use convolution with bias haha.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89128
Approved by: https://github.com/ezyang