@swift-ci please benchmark
@swift-ci Please Apple Silicon benchmark
@swift-ci please test
@swift-ci please test source compatibility
@swift-ci please test windows platform
------- Performance (arm64): -O -------
REGRESSION OLD NEW DELTA RATIO
ArrayAppendGenericStructs 866.0 997.5 +15.2% **0.87x (?)**
IMPROVEMENT OLD NEW DELTA RATIO
StringWithCString2 0.001 0.0 -50.0% **2.00x (?)**
SIMDReduce.Int32x16.Initializer 13.409 11.135 -17.0% **1.20x (?)**
MapReduceLazyCollectionShort 21.83 19.731 -9.6% **1.11x (?)**
------- Code size: -O -------
REGRESSION OLD NEW DELTA RATIO
RangeReplaceableCollectionPlusDefault.o 5835 6067 +4.0% **0.96x**
IMPROVEMENT OLD NEW DELTA RATIO
DictOfArraysToArrayOfDicts.o 18270 17874 -2.2% **1.02x**
------- Performance (arm64): -Osize -------
------- Code size: -Osize -------
IMPROVEMENT OLD NEW DELTA RATIO
DictOfArraysToArrayOfDicts.o 15329 15177 -1.0% **1.01x**
------- Performance (arm64): -Onone -------
REGRESSION OLD NEW DELTA RATIO
ArrayAppendToGeneric 226.774 351.29 +54.9% **0.65x (?)**
StringWithCString2 0.004 0.005 +20.0% **0.83x (?)**
IMPROVEMENT OLD NEW DELTA RATIO
ArrayAppendGenericStructs 872.308 700.0 -19.8% **1.25x (?)**
------- Code size: -swiftlibs -------
------- Performance (x86_64): -O -------
REGRESSION OLD NEW DELTA RATIO
MapReduceAnyCollection 88.0 103.444 +17.5% **0.85x**
MapReduceClass2 10.916 12.667 +16.0% **0.86x (?)**
IMPROVEMENT OLD NEW DELTA RATIO
MapReduceLazyCollectionShort 32.667 26.152 -19.9% **1.25x**
FlattenListLoop 1626.0 1386.0 -14.8% **1.17x (?)**
MapReduceLazySequence 11.427 9.852 -13.8% **1.16x**
FlattenListFlatMap 4568.0 4024.0 -11.9% **1.14x (?)**
------- Code size: -O -------
REGRESSION OLD NEW DELTA RATIO
RangeReplaceableCollectionPlusDefault.o 7207 7447 +3.3% **0.97x**
IMPROVEMENT OLD NEW DELTA RATIO
DictOfArraysToArrayOfDicts.o 21871 21391 -2.2% **1.02x**
------- Performance (x86_64): -Osize -------
REGRESSION OLD NEW DELTA RATIO
MapReduceLazyCollectionShort 43.545 67.5 +55.0% **0.65x**
MapReduceLazySequence 44.115 65.944 +49.5% **0.67x**
Dictionary4 158.75 222.286 +40.0% **0.71x**
Dictionary4OfObjects 192.625 254.333 +32.0% **0.76x (?)**
StringFromLongWholeSubstring 2.827 3.042 +7.6% **0.93x (?)**
IMPROVEMENT OLD NEW DELTA RATIO
ObjectiveCBridgeStubFromNSDate 3918.0 2680.0 -31.6% **1.46x (?)**
------- Code size: -Osize -------
IMPROVEMENT OLD NEW DELTA RATIO
StringRemoveDupes.o 3369 3330 -1.2% **1.01x**
------- Performance (x86_64): -Onone -------
REGRESSION OLD NEW DELTA RATIO
ArrayAppendGenericStructs 946.667 1316.667 +39.1% **0.72x (?)**
------- Code size: -swiftlibs -------
2158 | } | ||
2159 | |||
2160 | void MemoryToRegisters::canonicalizeValueLifetimes( | ||
2161 | StackList<SILValue> &owned, StackList<SILValue> &guaranteed) { |
What is the difference between calling canonicalize here v/s running a round of copy propagation ? I thought we canonicalize a value in a pass for cleanup or if it enables additional optimization, like in silcombine ?
Yeah, adding a run of CopyPropagation was the first solution that came to mind after noticing that there were values whose copy-extended lifetime could newly be canonicalized after Mem2Reg but weren't before OME runs. This alternative came out of discussion with @atrick.
It might be reasonable to have as a principle that a pass which introduces a new value ought to be responsible for canonicalizing it's lifetime. That could be used to justify canonicalizing the lifetimes of the new phis.
For the stored values, we'd have to say something like "passes that see that they've created opportunities to canonicalize copy-extended lifetimes can do so". It seems kind of a shame to toss in a pass to the pipeline to rediscover opportunities for canonicalization that were known about already.
If we think about the CanonicalizeOSSALifetime and CanonicalizeBorrowScope as utilities that to be called from anywhere rather than as a part of the CopyPropagation pass, this perspective might become more appealing.
Yeah, you are right it feels wasteful to throw away newly discovered opportunities. But this also seems to make passes complex and might make bisection slightly harder. I am conflicted/ambivalent about this.
A flag to disable the canonicalization might address bisection however.
Good call!
Yeah, you are right it feels wasteful to throw away newly discovered opportunities. But this also seems to make passes complex and might make bisection slightly harder.
These are all important points. In this case, we've already established elsewhere that we need a very simple and efficient OSSA canonicalization utility that runs on specific values after a pass that exposes copies.
We may have discussed before that there are two kinds of borrow canonicalization: (1) a trivial utility that runs on SILFunction argument and basically does that same thing as owned canonicalization and (2) a fairly involved analysis and rewriting of borrow scopes. We should make it more clear in the API that only the first form should be called as a cleanup utility and that the second form should only run during the dedicated CopyPropagation pass.
LGTM! A flag to disable canonicalization would be helpful - but that can be a follow up.
Looks reasonable to me. I think we talked about separating the simple CanonicalizeBorrowScope utility from all the other copy propagation stuff, eventually...
I think this warrants a brief explanation. Like...
Record stored values because promoting the store exposes a copy of the stored value. After promotion, OSSA canonicalization runs on these value to cleanup unnecessary copies.
Done.
This is slightly mysterious. You may want to explain that, each new phi generated by Mem2Reg of alloc_stack will be the only new phi in a block at the end of the block's arguments list.
Done.
@meg-gupta Added the flag sil-mem2reg-disable-lifetime-canonicalization
.
@swift-ci please smoke test and merge
Login to write a write a comment.
Canonicalize the lifetimes of values which were stored to the promoted
alloc_stack
and of the newly created phis.