llvm-project
3dfb7823 - [AMDGPU][SIInsertWaitcnt] Implement Waitcnt Expansion for Profiling (#169345)

Commit
68 days ago
[AMDGPU][SIInsertWaitcnt] Implement Waitcnt Expansion for Profiling (#169345) Reference issue: https://github.com/ROCm/llvm-project/issues/67 This patch adds support for expanding s_waitcnt instructions into sequences with decreasing counter values, enabling PC-sampling profilers to identify which specific memory operation is causing a stall. This is controlled via: Clang flag: -mamdgpu-expand-waitcnt-profiling / -mno-amdgpu-expand-waitcnt-profiling Function attribute: "amdgpu-expand-waitcnt-profiling" When enabled, instead of emitting a single waitcnt, the pass generates a sequence that waits for each outstanding operation individually. For example, if there are 5 outstanding memory operations and the target is to wait until 2 remain: **Original**: s_waitcnt vmcnt(2) **Expanded**: s_waitcnt vmcnt(4) s_waitcnt vmcnt(3) s_waitcnt vmcnt(2) The expansion starts from (Outstanding - 1) down to the target value, since waitcnt(Outstanding) would be a no-op (the counter is already at that value). - Uses ScoreBrackets to determine the actual number of outstanding operations - Only expands when operations complete in-order - Skips expansion for mixed event types (e.g., LDS+SMEM on same counter) - Skips expansion for scalar memory (always out-of-order) Releated previous work for Reference - **PR**: llvm/llvm-project#79236 (related `-amdgpu-waitcnt-forcezero`) --------- Co-authored-by: Pierre van Houtryve <pierre.vanhoutryve@amd.com>
Parents
Loading