llvm-project
0b3934d6 - [SROA] Avoid redundant `.oldload` generation when `memset` fully covers a partition (#179643)

Commit
32 days ago
[SROA] Avoid redundant `.oldload` generation when `memset` fully covers a partition (#179643) In our internal (ByteDance) builds we frequently hit very large `DeadPhiWeb`s that cause serious compile-time slowdowns, especially in some auto-generated code where a single file can take 20+ minutes to compile. There were previous attempts to reduce `DeadPhiWeb` in `InstCombine` (e.g. llvm/llvm-project#108876 and llvm/llvm-project#158057), but in our workload we still see a lot of time spent later in the pipeline (notably `JumpThreading` and `CorrelatedValuePropagation`). After digging into our cases, a big chunk of the `DeadPhiWeb` comes from SROA rewriting `memset`s. We often end up with patterns like: ``` %.sroa.xxx.oldload = load <ty>, ptr %.sroa.xxx %unused = ptrtoint ptr %.sroa.xxx.oldload to i64 ; or a bitcast-like use store <ty> <new_value>, ptr %.sroa.xxx ``` Even if `%unused` is cleaned up by later DCE-style passes, the load/store shape can still make `PromoteMem2Reg` conservatively treat many blocks as live-in when computing IDF. With cyclic CFGs this can easily create large, sticky dead phi webs, and the rest of the pipeline pays for it. The core issue is that `visitMemSetInst` was using the slice’s original offsets (`BeginOffset`/`EndOffset`) when deciding whether it needs to merge with an `.oldload` to preserve bytes not written by the `memset`. First, there was a typo in the original condition (`EndOffset != NewAllocaBeginOffset` instead of `EndOffset != NewAllocaEndOffset`), which effectively made the check always true and forced the merge path in most cases. Second, even if the typo is fixed, comparing the original slice range against the partition bounds is still too strict: cases where the `memset` contains the partition (e.g. a large `memset` over the whole alloca while the partition is just a subrange) would still be misclassified as requiring an `.oldload`. Both issues lead to many redundant loads and downstream dead phi webs. This change switches the check to use the already-computed intersection offsets (`NewBeginOffset`/`NewEndOffset`) against the partition bounds, so we only generate `.oldload` when the `memset` actually writes only part of the partition: ```diff - if (IntTy && (BeginOffset != NewAllocaBeginOffset || - EndOffset != NewAllocaBeginOffset)) { + if (IntTy && (NewBeginOffset != NewAllocaBeginOffset || + NewEndOffset != NewAllocaEndOffset)) { ; emit oldload + insertInteger merge } ``` In our workload this cuts down a lot of pointless `.oldload`s and helps reduce the size of dead phi webs seen after `mem2reg`, improving compile time without changing semantics (partial overwrites still merge, full overwrites don’t).
Author
Parents
Loading