pytorch
20aa417e - [PyTorch] [Quantization] Speed up PackedEmbeddingBagWeight::prepack() (#66632)

Commit
3 years ago
[PyTorch] [Quantization] Speed up PackedEmbeddingBagWeight::prepack() (#66632) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66632 Calling `.item<float>()` for each element in a tensor is expensive. Instead convert the entire Tensor in one call to `Tensor::copy_(input_tensor)`. See [this post](https://fb.workplace.com/groups/1144215345733672/posts/2080756188746245/) for more details. ghstack-source-id: 140639868 Test Plan: Build and run with bundled inputs. ### AI Bench Before: [AI Bench](https://www.internalfb.com/intern/aibench/details/877359346171823), [Flamegraph](https://interncache-all.fbcdn.net/manifold/aibench/tree/mobile/pt/profiling_reports/speech_transducer_v6_perf_1634185889953.html): 500ms After: [AI Bench](https://www.internalfb.com/intern/aibench/details/60828780633319), [Flamegraph](https://interncache-all.fbcdn.net/manifold/aibench/tree/mobile/pt/profiling_reports/speech_transducer_v6_perf_1634231176980.html): 444ms We went from 500ms to 444ms, which is a reduction of ~11%. Reviewed By: supriyar Differential Revision: D31657430 fbshipit-source-id: 199ec9de3dab84bb5727d81c7804bb83bebf7b48
Author
Parents
Loading