[AMDGPU] Fix missing waitcnt after buffer_wbl2 (#178316)
On GFX9, BUFFER_WBL2 is used to write back dirty cache lines and
requires an s_waitcnt vmcnt(0) afterwards to ensure completion.
This patch fixes by incrementing vmcnt for buffer_wbl2 instruction
---------
Co-authored-by: Jay Foad <jay.foad@gmail.com>