[AMDGPU][SIMemoryLegalizer][GFX12] Correctly insert sample/bvhcnt (#161637)
The check used was not strong enough to prevent the insertion of sample/bvhcnt when they were not needed.
I assume SIInsertWaitCnts was trimming those away anyway, but this was a bug nonetheless.
We were inserting SAMPLE/BVHcnt waits in places where we only needed to wait on the previous atomic operation. Neither of these counter have any atomics associated with them.