Remove unnecessary __at_align32__ in int_elementwise_binary_256 (#45470)
Summary:
They were added in 4b3046ed286e92b5910769bf97f2bc6a1ad473d1 based on a
misunderstanding of `_mm256_storeu_si256`, but they
are actually unnecessary. The [document][1] of `_mm256_storeu_si256` says:
> Moves values from a integer vector to an **unaligned** memory location.
In this case, it's better to remove the `__at_align32__` qualifier to
leave the compiler and linker more flexibility to optimize.
[1]: https://software.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top/compiler-reference/intrinsics/intrinsics-for-intel-advanced-vector-extensions/intrinsics-for-load-and-store-operations-1/mm256-storeu-si256.html
Close https://github.com/pytorch/pytorch/issues/44810
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45470
Reviewed By: zhangguanheng66
Differential Revision: D23980060
Pulled By: glaringlee
fbshipit-source-id: 12b3558b76c6e81d88a72081060fdb8674464768