pytorch
9b89ff09 - [Quant] onednn backend switch to ideep new api without affacting performance (#90354)

Commit
2 years ago
[Quant] onednn backend switch to ideep new api without affacting performance (#90354) **Summary** Onednn quantization backend switch to new API in `third_party/ideep`. - `struct forward_params` for conv/deconv are changed. Modify primitive cache accordingly. - Use new versions of `prepare` and `compute` API. Fp32 and int8 paths separated. The old ones will be deprecated. - Now `ideep::tensor::reorder_if_differ_in` supports block-to-block reorder. Use it instead of defining a util function `onednn_utils::try_reorder`. - For new API of transposed convolution, we can use a flag to keep weight desc align with oneDNN thus needless to transpose it explicitly in PyTorch. - Use `is_channels_last` flag to specify layout of src/dst when querying expected weight desc. It won't impact correctness. Performance should be unaffected or slightly better. FBGEMM and QNNPACK backends are not affected. Performance results are given below. 1. End-to-end performance of static quantized models (from torchvision) (throughput: fps, higher is better) ![image](https://user-images.githubusercontent.com/12522207/206105879-45c59996-9804-4531-aa1f-dc962e6db5ab.png) 2. Op benchmark of dynamic quantized linear (Latency: ms, lower is better) ![image](https://user-images.githubusercontent.com/12522207/206124949-77352991-0fda-4285-a484-e20a5797262b.png) Test method & env: - Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz - Run multi-instances on a single node. Use one core for each instance. - Use Jemalloc and Intel OpenMP **Test plan** python test/test_quantization.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/90354 Approved by: https://github.com/jgong5
Author
Committer
Parents
Loading