Use templates instead of macro when defining Vec256<BFloat16> bin operators (#35844)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35844
Also, bitwise operators can operate on the underlying __m256i
representation directly instead of making expensive conversions to
float16.
Test Plan: Imported from OSS
Differential Revision: D20927639
Pulled By: ngimel
fbshipit-source-id: 148c503df090580c8504f0df8d6ed2648d614120