llvm-project
ed0ba3cb - [AMDGPU] Align loop headers to prevent instruction fetch split on GFX950 (#181999)

Commit
96 days ago
[AMDGPU] Align loop headers to prevent instruction fetch split on GFX950 (#181999) On GFX9, the instruction sequencer fetches 32 bytes at a time. When an 8-byte instruction at a loop header straddles a 32-byte fetch window boundary, the sequencer must perform two fetches after a backward branch, incurring a delay. On GFX950, this causes additional performance issues. This patch adds 32-byte alignment (.p2align 5, , 4) for loop headers on GFX950 when the first real instruction is 8 bytes. At most one s_nop (4 bytes, 1 quad-cycle before the loop) is used for padding. If more than 4 bytes of padding were needed, the 8-byte instruction would not straddle a 32-byte boundary anyway, so alignment is skipped. Note: the alignment decision is made during block-placement, before si-insert-waitcnts. In loops where a 4-byte S_WAITCNT is later inserted as the first instruction, the alignment becomes redundant but mostly harmless (at most one extra s_nop per affected loop). Assisted-by: Claude (Anthropic)
Parents
Loading