pytorch
71d5a8ea - [nnc] Benchmark inference batchnorm (#52251)

Commit
3 years ago
[nnc] Benchmark inference batchnorm (#52251) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52251 Batchnorm in inference is just a bunch of pointwise ops. NNC should be able to do a good job of this, and indeed it does. For fun I've included a fused BN->Relu (although the real fusion fun would be Conv->BN->Relu...). ``` --------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... --------------------------------------------------------------------------------------- BatchNorm/ATen/1/64/112/112 252886 ns 252875 ns 2785 GB/s=25.3981G/s BatchNorm/ATen/1/256/14/14 12145 ns 12145 ns 55347 GB/s=33.0525G/s BatchNorm/ATen/1/128/28/28 18919 ns 18918 ns 37749 GB/s=42.437G/s BatchNorm/ATen/1/64/56/56 61434 ns 61433 ns 11315 GB/s=26.1363G/s BatchNorm/ATen/1/512/7/7 11924 ns 11923 ns 59070 GB/s=16.8327G/s BatchNorm/ATen/5/64/112/112 1873321 ns 1873292 ns 382 GB/s=17.1424G/s BatchNorm/ATen/5/256/14/14 83470 ns 83459 ns 8538 GB/s=24.0483G/s BatchNorm/ATen/5/128/28/28 157521 ns 157520 ns 4440 GB/s=25.4829G/s BatchNorm/ATen/5/64/56/56 314675 ns 314670 ns 2235 GB/s=25.513G/s BatchNorm/ATen/5/512/7/7 48129 ns 48128 ns 14582 GB/s=20.851G/s BatchNorm/NNC/1/64/112/112 249454 ns 249428 ns 2802 GB/s=25.749G/s BatchNorm/NNC/1/256/14/14 9321 ns 9321 ns 74573 GB/s=43.066G/s BatchNorm/NNC/1/128/28/28 16874 ns 16873 ns 40999 GB/s=47.5797G/s BatchNorm/NNC/1/64/56/56 59276 ns 59275 ns 12047 GB/s=27.0878G/s BatchNorm/NNC/1/512/7/7 3452 ns 3452 ns 202610 GB/s=58.1394G/s BatchNorm/NNC/5/64/112/112 1820201 ns 1820038 ns 373 GB/s=17.6439G/s BatchNorm/NNC/5/256/14/14 78429 ns 78420 ns 8871 GB/s=25.5935G/s BatchNorm/NNC/5/128/28/28 155214 ns 155202 ns 4514 GB/s=25.8635G/s BatchNorm/NNC/5/64/56/56 311454 ns 311449 ns 2163 GB/s=25.7768G/s BatchNorm/NNC/5/512/7/7 26853 ns 26851 ns 25283 GB/s=37.3735G/s BatchNorm/ATenRelu/1/64/112/112 378879 ns 378849 ns 1844 GB/s=16.9528G/s BatchNorm/ATenRelu/1/256/14/14 16707 ns 16705 ns 41391 GB/s=24.029G/s BatchNorm/ATenRelu/1/128/28/28 30235 ns 30235 ns 23060 GB/s=26.5529G/s BatchNorm/ATenRelu/1/64/56/56 91164 ns 91160 ns 7662 GB/s=17.6132G/s BatchNorm/ATenRelu/1/512/7/7 14681 ns 14681 ns 46088 GB/s=13.6707G/s BatchNorm/ATenRelu/5/64/112/112 2864060 ns 2863566 ns 243 GB/s=11.2142G/s BatchNorm/ATenRelu/5/256/14/14 118376 ns 118367 ns 5907 GB/s=16.9561G/s BatchNorm/ATenRelu/5/128/28/28 237893 ns 237873 ns 2936 GB/s=16.8749G/s BatchNorm/ATenRelu/5/64/56/56 472452 ns 472386 ns 1479 GB/s=16.9949G/s BatchNorm/ATenRelu/5/512/7/7 61389 ns 61379 ns 11442 GB/s=16.3496G/s BatchNorm/NNCRelu/1/64/112/112 248378 ns 248341 ns 2812 GB/s=25.8618G/s BatchNorm/NNCRelu/1/256/14/14 9965 ns 9964 ns 76013 GB/s=40.2861G/s BatchNorm/NNCRelu/1/128/28/28 16153 ns 16153 ns 43343 GB/s=49.7004G/s BatchNorm/NNCRelu/1/64/56/56 58761 ns 58757 ns 12095 GB/s=27.3265G/s BatchNorm/NNCRelu/1/512/7/7 10529 ns 10529 ns 66590 GB/s=19.0625G/s BatchNorm/NNCRelu/5/64/112/112 1799001 ns 1798757 ns 362 GB/s=17.8527G/s BatchNorm/NNCRelu/5/256/14/14 78252 ns 78246 ns 8974 GB/s=25.6504G/s BatchNorm/NNCRelu/5/128/28/28 154940 ns 154923 ns 4483 GB/s=25.9102G/s BatchNorm/NNCRelu/5/64/56/56 312329 ns 312324 ns 2244 GB/s=25.7046G/s BatchNorm/NNCRelu/5/512/7/7 51203 ns 51199 ns 13559 GB/s=19.6004G/s ``` Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D26440786 Pulled By: bertmaher fbshipit-source-id: 7d3f7bf6eee4c37736e9875d31ae1b483af9fb6f
Author
Parents
Loading