[PyTorch Edge] Use Integer Subtraction (Instead of Float) in Non-FBGEMM Dequantization (#67115)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67115
This matches what FBGEMM does (https://fburl.com/code/vjrdn6tj → https://fburl.com/code/btkdn24l)
Benchmark Mobile Vision Transformer Model Results (as described in D31066997 and config from rebasing onto v4 of D31869106):
This diff (v18):
- NET latency: 109.866
- https://our.intern.facebook.com/intern/aibench/details/536304563225483
This diff before using vsubl (v14 but rebased onto v22 of D31205883, the previous diff in this stack)
- NET latency: 115.887
- https://our.intern.facebook.com/intern/aibench/details/906978557243297
Before this diff (v22 of D31205883):
- NET latency: 116.449
- https://our.intern.facebook.com/intern/aibench/details/870678436773989
ghstack-source-id: 142166375
Test Plan: Phabricator tests + Running quantized_test on a pixel3a passes and Running mobile vision transformer model (as described in D31066997) both work
Reviewed By: kimishpatel
Differential Revision: D31483135
fbshipit-source-id: fbef00cad6087b49900d21c3dd3b6fd432f64e94