AMDGPU: Rename and add bf16 support for global_load_tr builtins (#86202)
Make the name of a clang builtin as close to the mnemonic instruction
name as possible. The data type suffix may not be enough to tell what
instruction the builtin is going to produce.
This patch also add the bf16 support for global_load_tr_b128 builtins.