[mlir][acc] Add ACCSpecializeForDevice and ACCSpecializeForHost passe… (#173527)
[mlir][acc] Add ACCSpecializeForDevice and ACCSpecializeForHost passes
Add two new transformation passes for specializing OpenACC IR for
different execution contexts:
ACCSpecializeForDevice:
- Strips OpenACC constructs that are invalid in device code
- Replaces data entry ops with their var operands
- Unwraps regions from compute/data constructs
- Erases runtime operations (init, shutdown, wait, etc.)
This pass is applicable in two contexts:
1. Functions marked with `acc.specialized_routine` attribute, where the
entire function body is device code
2. Non-specialized functions, where patterns are applied only to `acc`
operations nested inside compute constructs (parallel, serial, kernels),
not to the constructs themselves
ACCSpecializeForHost:
- Converts orphan `acc` operations for host execution
- Transforms `acc.atomic.*` to load/store via `PointerLikeType`
interface
- Converts `acc.loop` to `scf.for` or `scf.execute_region`
- Replaces orphan data entry ops with their var operands
This pass operates in two modes:
1. Default (orphan) mode: Only converts `acc` operations that are not
inside or attached to compute regions. Used for host `acc routine`s
where compute constructs should be preserved.
2. Host fallback mode (enable-host-fallback=true): Converts ALL `acc`
operations including compute constructs, data regions, and runtime ops.
This is used to allow testing of the full conversion. These patterns
will be used to handle conditional host execution of `acc` regions with
if clause.
The pattern population functions (populateACCSpecializeForDevice,
populateACCOrphanToHostPatterns, populateACCHostFallbackPatterns) are
exposed so other passes can reuse these patterns.
---------
Co-authored-by: Susan Tan <zujunt@nvidia.com>
Co-authored-by: Scott Manley <rscottmanley@gmail.com>