[X86] Add i256/i512 CTPOP expansion on AVX512VPOPCNTDQ targets (#182830)
If we can freely fold the i256/i512 value to the FPU, then we can use
VPOPCNTQ to perform a per-element CTPOP, then perform an expanded
VECREDUCE_ADD (VPMOVQB v4i64/v8i64 to v16i8 with zero uppers - then
VPSADBW to sum the lower v8i8 bits).
Fixes #182829