Fix vec256 inversion (#15659)
Summary:
soumith zou3519
I was browsing the code, and think `vec256_int.h` might need a minor revision, but not 100% sure.
1. It currently invert the result by `XOR` with 0. Should it `XOR` with 1 instead?
~2. AVX2 logical operations would set all bits in a byte/word/... to `1` if the condition holds. So functions, such as `_mm256_cmpeq_epi64 ` would return `0/-1` instead of `0/1`. Should it be masked with `1` to make sure it returns 0/1?~
~Would I be correct if I assume that the code revised below is not yet activated, but will be after we port legacy code to ATen?~
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15659
Differential Revision: D13565929
Pulled By: mrshenli
fbshipit-source-id: 8ae3daf256c3d915dd855a2215c95275e899ea8c