bring back skipped bitwise dispatch (#25689)
Summary:
Before https://github.com/pytorch/pytorch/issues/24879, `bitwise_not` calls into `at::bitwise_not_out` which goes through a device dispatch. But after the PR it's dispatched directly to `at::native::bitwise_not_out` which only has cpu and cuda impls. Skipping `at::` dispatch indeed broke XLA but XLA didn't have unary tests. We didn't notice it until a test has been added in https://github.com/pytorch/xla/pull/986. :P
Trying to fix the breakage in this PR to save a revert.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25689
Differential Revision: D17201071
Pulled By: ailzhang
fbshipit-source-id: 0ca560a14a2ec6141f3795479c6dcb460e3805b5