embedding_bag make_bag_size optimization (#30701)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30701
From James' PR https://github.com/pytorch/pytorch/pull/19715
embedding_bag microbenchmarks:
Baseline: P123020983
Refactor make_bag_size, no changing at::zeros to at::empty (this diff): P123021393
Inference benchmark on T6_SKL - _embedding_bag self time only:
bs=40, baseline: .302 ms/iter
bs=40, with diff: .244 ms/iter
bs=1 baseline: .148 ms/iter
bs=1 with diff: .124 ms/iter
The bigger gap comes from fb::embedding_bag_byte_rowwise_offsets, I'm looking into that one too.
Test Plan:
MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 3 ./inference_benchmark_nolr_emb.par --pt-scripted-model=traced_model.pt --pt-inputs="batch_size_40/pt_inputs.pth" --iters=3000 --warmup-iters=100
buck run mode/opt //caffe2/benchmarks/operator_benchmark:benchmark_all_other_test -- --tag_filter all --iterations 3000 --operators embeddingbag
Reviewed By: yinghai, qizzzh
Differential Revision: D18800166
fbshipit-source-id: 820e6ece0b6ade72ee42409661f92c548f43a4cb