Fix and reenable threaded QNNPACK linear (#40587)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40587
Previously, this was causing divide-by-zero only in the multithreaded
empty-batch case, while calculating tiling parameters for the threads.
In my opinion, the bug here is using a value that is allowed to be zero
(batch size) for an argument that should not be zero (tile size), so I
fixed the bug by bailing out right before the call to
pthreadpool_compute_4d_tiled.
Test Plan: TestQuantizedOps.test_empty_batch
Differential Revision: D22264414
Pulled By: dreiss
fbshipit-source-id: 9446d5231ff65ef19003686f3989e62f04cf18c9