llvm-project
e8219e5c - [VPlan] Use BlockFrequencyInfo in getPredBlockCostDivisor (#158690)

Commit
145 days ago
[VPlan] Use BlockFrequencyInfo in getPredBlockCostDivisor (#158690) In 531.deepsjeng_r from SPEC CPU 2017 there's a loop that we unprofitably loop vectorize on RISC-V. The loop looks something like: ```c for (int i = 0; i < n; i++) { if (x0[i] == a) if (x1[i] == b) if (x2[i] == c) // do stuff... } ``` Because it's so deeply nested the actual inner level of the loop rarely gets executed. However we still deem it profitable to vectorize, which due to the if-conversion means we now always execute the body. This stems from the fact that `getPredBlockCostDivisor` currently assumes that blocks have 50% chance of being executed as a heuristic. We can fix this by using BlockFrequencyInfo, which gives a more accurate estimate of the innermost block being executed 12.5% of the time. We can then calculate the probability as `HeaderFrequency / BlockFrequency`. Fixing the cost here gives a 7% speedup for 531.deepsjeng_r on RISC-V. Whilst there's a lot of changes in the in-tree tests, this doesn't affect llvm-test-suite or SPEC CPU 2017 that much: - On armv9-a -flto -O3 there's 0.0%/0.2% more geomean loops vectorized on llvm-test-suite/SPEC CPU 2017. - On x86-64 -flto -O3 **with PGO** there's 0.9%/0% less geomean loops vectorized on llvm-test-suite/SPEC CPU 2017. Overall geomean compile time impact is 0.03% on stage1-ReleaseLTO: https://llvm-compile-time-tracker.com/compare.php?from=9eee396c58d2e24beb93c460141170def328776d&to=32fbff48f965d03b51549fdf9bbc4ca06473b623&stat=instructions%3Au
Author
Parents
Loading