[Clang] Improve scan in gpuintrin.h (#189381)
Summary:
Right now the scan checks to avoid the unspecified behavior in
`clzg(0)`. This is used as the source to the shuffle instruction, but
the argument is discarded at zero anyway. So, we simply pass unspecified
behavior to shuffle and then discard it. This should be fine. The scan
routines are expected to be optimal.
Also renames `sum` to `add`.