[AArch64] New pass for code layout optimizations. (#184434)
This pass is intended to optimize code layout prior to AsmPrinter. The
initial version handles two known cases:
I. FCMP-FCSEL
II. CMP/CMN-CSEL, 32-bit only
Using existing directives, the pass induces function-alignment (of
64-bytes by default) when a pair is detected, and possibly induces
block-alignment of up to 4-bytes on top of that if the pair would
straddle cache-lines.
Beyond performance improvement, this pass reduces noise due to code
layout thus stabilizes measured performance over-time. For example,
knock-out effects on a "sensitive function" won't be triggered by
codegen changes outside it.
Enabled by default on processors with the new `FeatureAlignCmpCSelPairs`
subtarget feature (gated per sub-case by `FeatureFuseCmpCSel` /
`FeatureFuseFCmpFCSel`); each case can also be forced through the
`-aarch64-code-layout-opt` enumerated bit-mask
---------
Co-authored-by: Jon Roelofs <jroelofs@gmail.com>
rdar://171283264