Re-apply PyTorch pthreadpool changes (#40951)
* Re-apply PyTorch pthreadpool changes
Summary:
This re-applies D21232894 (https://github.com/pytorch/pytorch/commit/b9d3869df357038f798eef579fe1c69cf246887d) and D22162524, plus updates jni_deps in a few places
to avoid breaking host JNI tests.
Test Plan: `buck test @//fbandroid/mode/server //fbandroid/instrumentation_tests/com/facebook/caffe2:host-test`
Reviewed By: xcheng16
Differential Revision: D22199952
fbshipit-source-id: df13eef39c01738637ae8cf7f581d6ccc88d37d5
* Enable XNNPACK ops on iOS and macOS.
Test Plan: buck run aibench:run_bench -- -b aibench/specifications/models/pytorch/pytext/pytext_mobile_inference.json --platform ios --framework pytorch --remote --devices D221 (https://github.com/pytorch/pytorch/commit/9788a74da8fdba0675f0e67a0fbe55c3eb5dc486)AP-12.0.1
Reviewed By: xta0
Differential Revision: D21886736
fbshipit-source-id: ac482619dc1b41a110a3c4c79cc0339e5555edeb
* Respect user set thread count. (#40707)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40707
Test Plan: Imported from OSS
Differential Revision: D22318197
Pulled By: AshkanAliabadi
fbshipit-source-id: f11b7302a6e91d11d750df100d2a3d8d96b5d1db
* Fix and reenable threaded QNNPACK linear (#40587)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40587
Previously, this was causing divide-by-zero only in the multithreaded
empty-batch case, while calculating tiling parameters for the threads.
In my opinion, the bug here is using a value that is allowed to be zero
(batch size) for an argument that should not be zero (tile size), so I
fixed the bug by bailing out right before the call to
pthreadpool_compute_4d_tiled.
Test Plan: TestQuantizedOps.test_empty_batch
Differential Revision: D22264414
Pulled By: dreiss
fbshipit-source-id: 9446d5231ff65ef19003686f3989e62f04cf18c9
* Fix batch size zero for QNNPACK linear_dynamic (#40588)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40588
Two bugs were preventing this from working. One was a divide by zero
when multithreading was enabled, fixed similarly to the fix for static
quantized linear in the previous commit. The other was computation of
min and max to determine qparams. FBGEMM uses [0,0] for [min,max] of
empty input, do the same.
Test Plan: Added a unit test.
Differential Revision: D22264415
Pulled By: dreiss
fbshipit-source-id: 6ca9cf48107dd998ef4834e5540279a8826bc754
Co-authored-by: David Reiss <dreiss@fb.com>