Perf improvement of Conv2d and Conv3d (#40324)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40324
1) avoid the use of item 2) bypass the im2col for 1x1 conv
Test Plan:
unit test and perf benchmark to show improvement
```
num = 50
N = 1
C = 512
H = 4
W = 4
M = 512
kernel_h = 1
kernel_w = 1
stride_h = 1
stride_w = 1
padding_h = 0
padding_w = 0
X_np = np.random.randn(N, C, H, W).astype(np.float32)
W_np = np.random.randn(M, C, kernel_h, kernel_w).astype(np.float32)
X = torch.from_numpy(X_np)
conv2d_pt = torch.nn.Conv2d(
C, M, (kernel_h, kernel_w), stride=(stride_h, stride_w),
padding=(padding_h, padding_w), groups=1, bias=True)
class ConvNet(torch.nn.Module):
def __init__(self):
super(ConvNet, self).__init__()
self.conv2d = conv2d_pt
def forward(self, x):
return self.conv2d(x)
model = ConvNet()
def pt_forward():
# with torch.autograd.profiler.profile(record_shapes=True) as prof:
model(X)
# print(prof.key_averages().table(sort_by="self_cpu_time_total"))
torch._C._set_mkldnn_enabled(False)
t = Timer("pt_forward()", "from __main__ import pt_forward, X")
```
Before the optimization:
pt time = 5.841153813526034
After the optimization:
pt time = 4.513134760782123
Differential Revision: D22149067
fbshipit-source-id: 538d9eea5b729e6c3da79444bde1784bde828876