[X86] optimize 512-bit masked truncated saturating stores (#179130)
an oversight in https://github.com/llvm/llvm-project/pull/169827, for
the 512-bit version the `vl` target feature is not needed.
https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm512_mask_cvtsepi16_storeu_epi8&expand=1811&ig_expand=2150,2151