[AMDGPU] Add infrastructure for machine-level inliner
Add the necessary infrastructure for the machine-level inliner. The
inliner will initially only handle calls to functions with the
`amdgpu_gfx_whole_wave` calling convention. Partial inlining is
currently not supported - all whole wave functions will be inlined into
all their call sites and removed from the module (which should be safe
since whole wave functions can't be called indirectly and their address
can't be taken). As a consequence, recursive whole wave functions are
not supported yet (I'll fix that in a separate patch).
In addition to a MachineFunction pass representing the inliner itself,
the patch adds a custom FPPassManager (`AMDGPUInliningPassManager`)
which helps manage the inlining process. It does this by suspending the
processing of inlined functions when the inliner runs, which means they
will have the correct shape when the inliner runs on their callers.
After the pass pipeline is run on all the functions in the module, the
custom pass manager will finally release the inlined MachineFunctions
(in the future, it's easy to update it to run the remainder of the pass
pipeline on them instead of just deleting them, making it possible to
support partial inlining, and with it recursion). This works because
the backend passes already run inside a call graph pass manager, so the
callees are always processed before the callers.
The custom pass manager is inserted into the pipeline by another pass,
`AMDGPUInliningAnchor`, whose `preparePassManager` method will oust any
existing FunctionPass manager and replace it with the inlining pass
manager. This makes it possible to use the custom pass manager without
any other changes to the pass manager infrastructure.
Support for the new pass manager will be part of a different patch.