[Clang] Add more scan / reduce operations to 'gpuintrin.h' (#185525)
Summary:
This builds off the pattern to add support for more of the standard
operations. The reductions could concievably use the AMDGPU builtins
later once we can enable DPP or other optimizations.