pytorch
26cba842 - Optimize ConvTransposed2D with mkldnn float32 and bfloat16 on CPU (#92530)

Commit
3 years ago
Optimize ConvTransposed2D with mkldnn float32 and bfloat16 on CPU (#92530) this PR optimized `ConvTranspose2d` with oneDNN and add channels last support for it. Also the fallback path `slow_conv_transpose2d` also have channels last support. So the memory format propagation behavior would stay the same with or without oneDNN. Replacement of https://github.com/pytorch/pytorch/pull/77060, https://github.com/pytorch/pytorch/pull/70897 and https://github.com/pytorch/pytorch/pull/74023 which enables oneDNN for `ConvTranspose2d` and `ConvTranspose3d` The following results collects on Skylake Xeon 8180, dual sockets, 28 cores per socket. ### single core channels last configs | forward before/ms | forward after/ms | ratio | backward before/ms | backward after/ms | ratio -- | -- | -- | -- | -- | -- | -- input size: (32, 32, 100, 100), weight size: (32, 32, 3, 3) | 181.36 | 91.16 | 1.99 | 531.38 | 124.08 | 4.28 input size: (32, 16, 200, 200), weight size: (16, 16, 3, 3) | 324.35 | 153.50 | 2.11 | 973.16 | 185.97 | 5.23 input size: (32, 128, 100, 100), weight size: (128, 128, 3, 3) | 1086.82 | 671.52 | 1.62 | 3008.94 | 1453.33 | 2.07 ### single core channels first configs | forward before/ms | forward after/ms | ratio | backward before/ms | backward after/ms | ratio -- | -- | -- | -- | -- | -- | -- input size: (32, 32, 100, 100), weight size: (32, 32, 3, 3) | 138.10 | 5.94 | 23.23 | 37.97 | 11.25 | 3.38 input size: (32, 16, 200, 200), weight size: (16, 16, 3, 3) | 236.43 | 8.75 | 27.03 | 87.77 | 18.58 | 4.72 input size: (32, 128, 100, 100), weight size: (128, 128, 3, 3) | 484.39 | 37.69 | 12.85 | 185.40 | 90.57 | 2.05 ### single socket channels last configs | forward before/ms | forward after/ms | ratio | backward before/ms | backward after/ms | ratio -- | -- | -- | -- | -- | -- | -- input size: (32, 32, 100, 100), weight size: (32, 32, 3, 3) | 138.10 | 5.94 | 23.23 | 37.97 | 11.25 | 3.38 input size: (32, 16, 200, 200), weight size: (16, 16, 3, 3) | 236.43 | 8.75 | 27.03 | 87.77 | 18.58 | 4.72 input size: (32, 128, 100, 100), weight size: (128, 128, 3, 3) | 484.39 | 37.69 | 12.85 | 185.40 | 90.57 | 2.0 ### single socket channels first configs | forward before/ms | forward after/ms | ratio | backward before/ms | backward after/ms | ratio -- | -- | -- | -- | -- | -- | -- input size: (32, 32, 100, 100), weight size: (32, 32, 3, 3) | 132.56 | 7.19 | 18.43 | 31.43 | 11.20 | 2.81 input size: (32, 16, 200, 200), weight size: (16, 16, 3, 3) | 227.94 | 13.33 | 17.11 | 63.00 | 23.41 | 2.69 input size: (32, 128, 100, 100), weight size: (128, 128, 3, 3) | 473.68 | 52.79 | 8.97 | 150.40 | 87.33 | 1.72 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92530 Approved by: https://github.com/jgong5, https://github.com/ezyang
Author
Committer
Parents
Loading