SemanticDiff

pytorch
5627940e - Add a fast path for batch-norm CPU inference. (#19152)

Commit View On GitHub

Login via GitHub
Home
Pricing
FAQ
Install

Login via GitHub

Commit

5 years ago

Add a fast path for batch-norm CPU inference. (#19152) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19152 Adding a fast path for batch-norm CPU inference when all tensors are contiguous. * Leverage vectorization through smiple loops. * Folding linear terms before computation. * For resnext-101, this version gets 18.95 times faster. * Add a microbenchmark: * (buck build mode/opt -c python.package_style=inplace --show-output //caffe2/benchmarks/operator_benchmark:batchnorm_benchmark) && \ (OMP_NUM_THREADS=1 MKL_NUM_THREADS=1 buck-out/gen/caffe2/benchmarks/operator_benchmark/batchnorm_benchmark#binary.par) * batch_norm: data shape: [1, 256, 3136], bandwidth: 22.26 GB/s * batch_norm: data shape: [1, 65536, 1], bandwidth: 5.57 GB/s * batch_norm: data shape: [128, 2048, 1], bandwidth: 18.21 GB/s Reviewed By: soumith, BIT-silence Differential Revision: D14889728 fbshipit-source-id: 20c9e567e38ff7dbb9097873b85160eca2b0a795

Author

zheng-xq

zheng-xq

Committer

facebook-github-bot

facebook-github-bot

Parents

FAQ Terms Privacy Refunds Impressum

Loading