[msan] Handle Arm NEON BFloat16 multiply-add to single-precision (#178510)
aarch64.neon.bfmlalb/t perform dot-products after zeroing out the
odd/even-indexed values. We handle these by generalizing
handleVectorDotProductIntrinsic() and (mis-)using getPclmulMask().