[AArch64] Reduce the cost of repeated sub-shuffle (#139331)
Given a larger-than-legal shuffle we will split into multiple sub-parts.
This adds a check to the computed costs of sub-shuffles so that repeated
sequences are not accounted for multiple times. This especially reduces
the cost of broadcasts/splats.