clang/AMDGPU: Add __builtin_amdgcn_inverse_ballot_w{32,64} (#155724)
Add builtins that expose the underlying llvm.amdgcn.inverse.ballot
intrinsic that we've had for a while.
This allows more explicitly writing code that selects or branches in
terms of lane masks, which can lead to better code quality.