[Clang] Correctly link and handle PGO options on the GPU (#185761)
Summary:
Currently, the GPU targets ignore the standard profiling arguments. This
PR changes the behavior to use the standard handling, which links the in
the now-present `libclang_rt.profile.a` if the user built with the
compiler-rt support enabled. If it is not present this is a linker error
and we can always suppress with `-Xarch_host` and `-Xarch_device`.
Hopefully this doesn't cause some people pain if they're used to doing
`-fprofile-generate` on a CPU unguarded since it was a stange mix of a
no-op and not a no-op on the GPU until now.