llvm-project
346185c4 - [AArch64] Improve codegen of vectorised early exit loops (#119534)

Commit

1 year ago

[AArch64] Improve codegen of vectorised early exit loops (#119534) Once PR #112138 lands we are able to start vectorising more loops that have uncountable early exits. The typical loop structure looks like this: vector.body: ... %pred = icmp eq <2 x ptr> %wide.load, %broadcast.splat ... %or.reduc = tail call i1 @llvm.vector.reduce.or.v2i1(<2 x i1> %pred) %iv.cmp = icmp eq i64 %index.next, 4 %exit.cond = or i1 %or.reduc, %iv.cmp br i1 %exit.cond, label %middle.split, label %vector.body middle.split: br i1 %or.reduc, label %found, label %notfound found: ret i64 1 notfound: ret i64 0 The problem with this is that %or.reduc is kept live after the loop, and since this is a boolean it typically requires making a copy of the condition code register. For AArch64 this requires an additional cset instruction, which is quite expensive for a typical find loop that only contains 6 or 7 instructions. This patch attempts to improve the codegen by sinking the reduction out of the loop to the location of it's user. It's a lot cheaper to keep the predicate alive if the type is legal and has lots of registers for it. There is a potential downside in that a little more work is required after the loop, but I believe this is worth it since we are likely to spend most of our time in the loop.

References

#119534 - [AArch64] Improve codegen of vectorised early exit loops

Author

david-arm

Parents

8f17c908

llvm-project 346185c4 - [AArch64] Improve codegen of vectorised early exit loops (#119534)

llvm-project
346185c4 - [AArch64] Improve codegen of vectorised early exit loops (#119534)