llvm-project
d60601bb - [AArch64] Fold scalar-to-vector shuffles into DUP/FMOV (#166962)

Commit
60 days ago
[AArch64] Fold scalar-to-vector shuffles into DUP/FMOV (#166962) Previously, LLVM emitted inefficient instructions when the low lanes of a 128-bit vector were set to a scalar and high bits set to 0. This patch utilises instructions fmov/dup to set the low lanes to the necessary scalar and zeroes the high bits of the register. E.g. in its worse case, ``` int8x16_t foo_s8(int8_t a) { int8x16_t b = vcombine_s8(vdup_n_s8(a), vdup_n_s8(0)); return b; } ``` LLVM would emit: ``` foo_s8(signed char): movi v0.2d, #0000000000000000 mov v0.b[0], w0 mov v0.b[1], w0 mov v0.b[2], w0 mov v0.b[3], w0 mov v0.b[4], w0 mov v0.b[5], w0 mov v0.b[6], w0 mov v0.b[7], w0 ret ``` This patch now emits: - <2 x i64> from i64 -> fmov d0, x0 - <4 x i32> from i32 -> dup v0.2s, w0 - <8 x i16> from i16 -> dup v0.4h, w0 - <16 x i8> from i8 -> dup v0.8b, w0
Author
Parents
Loading