llvm-project
f1b42dcc - [LV] Vectorize early exit loops with stores using masking (#178454)

Commit
6 days ago
[LV] Vectorize early exit loops with stores using masking (#178454) This is an alternative approach to vectorizing early exit loops with stores that avoids needing to add an extra check block. This is a fairly straightforward approach that should work on vector ISAs supporting masked memory ops. The basic approach is to create a mask covering all lanes _before_ any exiting lane, using cttz.elts and active.lane.mask (which sets all lanes to true if the uncountable exit wasn't taken). If the uncountable exit was taken, then there will still be one scalar iteration left to perform after the vector loop, which will also handle which exit block we should branch to. We no longer need to advance exit conditions in the vector body to the next iteration (compared to the other PR), though we still need to move the recipes needed to generate the exit condition (depending on which memory operations are first in the loop). The advantage this has over a full in-loop mask approach is that we don't need to form intermediate masks for each uncountable exit; while I haven't tried to mix this with the ongoing multiple-exit work yet, we should be able to handle them without increasing the amount of generated per-exit code. We also won't need to unpick which exit condition was met first. For a pseudo-C example of the transformation (with S1 and S2 representing statements with a side effect, like stores, or possibly a load that may fault if continued past the early exit), given the following scalar loop: ```c for (i = 0; i < N; ++i) { S1; if (a[i] == threshold) break; S2; } ``` we would have a vector loop and scalar tail like the following: ```c int i = 0; for (; i < vecN; i += VF) { // Move load for uncountable exit condition before other // operations in the loop. vecA = a[i]...a[i+VF-1]; // Create mask for all lanes _before_ any uncountable exit. vecCmp = vecA == splat(threshold); mask = get.active.lane.mask(0, cttz.elts(vecCmp)); // Execute statements with side effects using the mask vecS1(mask); vecS2(mask); // If there was an uncountable exit, increase IV by the number // of elements in the mask, and bail out to the scalar tail. if (any_of(vecCmp)) { i += cttz.elts(vecCmp); break; } } // Scalar tail handles remaining iterations, plus any differences // in exit block for different exits. for (; i < N; ++i) { S1; if (a[i] == threshold) break; S2; } ``` For the mask, given a comparison result of `<0, 0, 1, 0>`, we would expect a mask of `<1, 1, 0, 0>`.
Author
Parents
Loading