[AArch64] Add tablegen patterns for i8 and i16 vector insert/extract pairs (#136091)
An i8 and i16 vector extract/insert has to go via a i32 to make sure the
types are legal. This patch adds patterns for extract from a i8/i16
vector, inserted into a i16/i32 vector. This avoids the round trip via a
GPR which can limit performance.