[CUDA/HIP][SYCL] Deduplicate deferred diagnostics across multiple callers (#185926)
[CUDA/HIP][SYCL] Deduplicate deferred diagnostics across multiple
callers
Deferred diagnostics for a function were emitted once per caller that
forced the function into device context. When multiple device functions
called the same host-device function containing errors, the diagnostics
were repeated for each caller, producing noisy duplicate output.
Change the deferred diagnostic emission to a two-pass approach:
1. During the call graph walk, collect callers in DeviceKnownEmittedFns
(now storing multiple callers per function) and mark functions that
need diagnostics, but don't emit yet.
2. After the walk completes, emit diagnostics once per function with
all callers listed as notes.
Call chain notes now use "called by" for the first caller in each chain
and "which is called by" for subsequent callers in the chain, making it
easy to distinguish separate call chains.
Also add documentation for deferred diagnostics and the concept of
HD-promoted functions to the HIP and CUDA docs.
Fixes: https://github.com/llvm/llvm-project/issues/180638