pytorch
cdcd1ac1 - [PyTorch Edge] Make contexts thread local for quantized matmul (#74676)

Commit View On GitHub

Commit

2 years ago

[PyTorch Edge] Make contexts thread local for quantized matmul (#74676) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74676 We don't want to create and destroy a new context with each multiplication Test Plan: From fbcode: ```buck test caffe2/test:quantization -- test_qmatmul``` # Performance Improvement *Benchmarking done by on a model which performs matmuls of the same shapes and counts as Transformer Model, as determined in D30901505* *Notebook in which Benchmarking was performed: https://www.internalfb.com/intern/anp/view/?id=1582075&revision_id=1891629751047842* **Improvement from this diff alone** ~9.71% Reduction in Latency - Non Thread Local Contexts (before this diff, D35087184 v2): [8.5410ms](https://www.internalfb.com/intern/aibench/details/661728682381311 ) - Thread Local Contexts (this diff, v12): [7.7113ms](https://www.internalfb.com/intern/aibench/details/956655867696198) **FP32 Matmul vs Quantized Matmul, Overall Improvement from this diff stack** 56% reduction in latency compared to FP32 Matmul, 71% reduction in latency compared to Naive QMatmul - FP32 Matmul: [17.4910ms](https://www.internalfb.com/intern/aibench/details/875394396322469) - Quantized Matmul (after this diff): [7.7113ms](https://www.internalfb.com/intern/aibench/details/956655867696198 ) - Naive Quantized Matmul (dequantize → fp32matmul → quantize): [26.8639ms](https://www.internalfb.com/intern/aibench/details/52181682131461 ) Reviewed By: kimishpatel Differential Revision: D34756288 fbshipit-source-id: b000658152cf71b4185dcd34a3cccc71b4cec1f0 (cherry picked from commit 5bc7ef6b5c3255388eb8fab230e44073004d2266)

Author

salilsdesai

Committer

pytorchmergebot

Parents

96c8f644

pytorch cdcd1ac1 - [PyTorch Edge] Make contexts thread local for quantized matmul (#74676)

Commit

pytorch
cdcd1ac1 - [PyTorch Edge] Make contexts thread local for quantized matmul (#74676)