[AArch64][Machine-Combiner] Split gather patterns into neon regs to multiple vectors (#142941)
This changes optimizes gather-like sequences, where we load values
separately into lanes of a neon vector. Since each load has serial
dependency, when performing multiple i32 loads into a 128 bit vector for example, it
is more profitable to load into separate vector registers and zip them.
rdar://151851094