[AArch64] Add streaming-mode stack hazards. (#98956)
Under some SME contexts, a coprocessor with its own separate cache will
be used for FPR operations. This can create hazards if the CPU and the
SME unit try to access the same area of memory, including if the access
is to an area of the stack.
To try to alleviate that, this patch attempts to introduce extra padding
into the stack frame between FP and GPR accesses, controlled by the
StackHazardSize option. Without changing the layout of the stack frame,
a stack object of the right size is added between GPR and FPR CSRs.
Another is added to the stack objects section, and stack objects are
sorted so that FPR > Hazard padding slot > GPRs (where possible).
Unfortunately some things are not handled well (VLA area, FPR arguments
on the stack, object with both GPR and FPR accesses), but if those are
controlled by the user then the entire stack frame becomes GPR at the
start/end with FPR in the middle, surrounded by Hazard padding. This can
greatly help reduce something that can be difficult for the user to
control themselves.
The current implementation is opt-in through an
-aarch64-stack-hazard-size flag, and should have no effect if the option
is unset. In the long run the implementation might change (for example
using more base pointers to separate in more cases, re-enabling ldp/stp
using an extra register, etc), but this gets at least something for
people to use in llvm-19 if they need it. The only change whilst the
option is unset will be a fix for making sure the stack increment is
added at the right place when it cannot be converted to postinc
(++MBBI). I believe without extra padding that can not normally be
reached.