pytorch
ced5c89b - add explicit vectorization for Half dtype on CPU (#96076)

Commit

1 year ago

add explicit vectorization for Half dtype on CPU (#96076) This patch is part of half float performance optimization on CPU: * add specification for dtype `Half` in `Vectorized<>` under both avx256 and avx512. * add specification for dtype `Half` in functional utils, e.g. `vec::map_reduce<>()`, which uses float32 as accumulate type. Also add a helper struct `vec_hold_type<scalar_t>`, since Vectorized<Half>::value_type is pointing to its underlying storage type which is `uint16_t`, leading to error if the kernel uses `Vec::value_type`. Half uses the same logic as BFloat16 in the Vectorized<>, each half vector is mapped to 2x float vectors for computation. Notice that this patch modified the cmake files by adding **-mf16c** on AVX2 build, from https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html, we can see that all the hardware platforms that support **avx2** already have **f16c** Pull Request resolved: https://github.com/pytorch/pytorch/pull/96076 Approved by: https://github.com/malfet

Author

mingfeima

Committer

pytorchmergebot

Parents

c99895ca

pytorch ced5c89b - add explicit vectorization for Half dtype on CPU (#96076)

pytorch
ced5c89b - add explicit vectorization for Half dtype on CPU (#96076)