pytorch
0a3db1d4 - [FX] Prototype Conv/BN fuser in FX (#47657)

Commit View On GitHub

Commit

3 years ago

[FX] Prototype Conv/BN fuser in FX (#47657) Summary: Some interesting stuff going on. All benchmarks are tested with both my implementation as well as the current quantized fuser. For these benchmarks, things like using MKLDNN/FBGEMM make a big differene. ## Manual compilation (everything turned off) In the small case, things look good ``` non-fused: 1.174886703491211 fused: 0.7494957447052002 ``` However, for `torchvision.resnet18`, we see ``` non-fused: 1.2272708415985107 fused: 3.7183213233947754 ``` This is because Conv (no bias) -> Batch Norm is actually faster than Conv (bias) if you don't have any libraries... ## Nightly (CPU) ``` Toy non-fused: 0.45807552337646484 fused: 0.34779977798461914 resnet18 non-fused: 0.14216232299804688 fused: 0.13438796997070312 resnet50 non-fused: 0.2999534606933594 fused: 0.29364800453186035 densenet161 non-fused: 0.6558926105499268 fused: 0.6190280914306641 inception_v3 non-fused: 1.2804391384124756 fused: 1.181272029876709 ``` with MKLDNN. We see a small performance gain across the board, with more significant performance gains for smaller models. ## Nightly (CUDA) ``` M non-fused: 1.2220964431762695 fused: 1.0833759307861328 resnet18 non-fused: 0.09721899032592773 fused: 0.09089207649230957 resnet50 non-fused: 0.2053072452545166 fused: 0.19138741493225098 densenet161 non-fused: 0.6830024719238281 fused: 0.660109281539917 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/47657 Reviewed By: eellison Differential Revision: D25127546 Pulled By: Chillee fbshipit-source-id: ecdf682038def046045fcc09faf9aeb6c459b5e3

Author

Chillee

Committer

facebook-github-bot

Parents

6d0947c8

pytorch 0a3db1d4 - [FX] Prototype Conv/BN fuser in FX (#47657)

Commit

pytorch
0a3db1d4 - [FX] Prototype Conv/BN fuser in FX (#47657)