[X86] lowerShuffleAsBroadcast - improve handling of non-zero element index broadcasts
On AVX2+, support broadcasting of any element if it occurs in the bottom 128-bit subvector by shuffling the element down to element 0 and then broadcasting.
Fixes #113396