pytorch
a4348358 - A heuristic to avoid perf incompatible MKLDNN formats for binary ops (#56089)

Commit
3 years ago
A heuristic to avoid perf incompatible MKLDNN formats for binary ops (#56089) Summary: After adding new ops to a set of fusible ops, mobilenetv3 slows down to **9000ms from 1200ms** without this fix. This happens because one of the inputs was expanded and converted to nchw/nhwc we might end up in a very bad spot if the second argument is in a blocked format. In this case, MKLDNN uses its reference implementation for a binary operation that follows these broadcasts and it could be up to ~100x slower. We use a very simple heuristic to convert an arg in nchw to the blocked format of the other argument. * MKLDNN_VERBOSE without the issue: [test_mobilenet_nopool.txt](https://github.com/pytorch/pytorch/files/6319528/test_mobilenet_nopool.txt) * MKLDNN_VERBOSE with the issue (Note the times for `ref` operations) [test_mobilenet_pool.txt](https://github.com/pytorch/pytorch/files/6319529/test_mobilenet_pool.txt) Pull Request resolved: https://github.com/pytorch/pytorch/pull/56089 Reviewed By: eellison Differential Revision: D27796688 Pulled By: Krovatkin fbshipit-source-id: fc34d76358ce899e3b1f2b69efb9b5c38f5af1ad
Author
Parents
Loading