[AMDGPU][Doc] GFX12.5 Barrier Execution Model (#185632)
- Document GFX12.5-specific intrinsics.
- Rename signal -> arrive, leave -> drop to match C++ terminology.
- Update execution model to support GFX12.5 semantics (e.g. threads can
arrive w/o waiting)
- Various clean-ups & wording updates on the model.
- Added "mutually exclusive" barrier objects.
- Added barrier-phase-with + related constraints.
- Document that barriers can exist at cluster scope too.
- Update GFX12 target semantics/code sequences to include GFX12.5.
The model is no longer marked as incomplete, it is now just
experimental.
There are more updates planned in the future to support more features,
and
improve some known shortcomings of the model. e.g., currently many
relations
encode too much semantic information, which means the model doesn't
build
when barriers aren't used correctly. I'd like the model to eventually
represent
broken executions as well, just like a memory model can.