Fix Linux CUDA 13.3 build (abseil + CCCL parse errors) (#29042)
### Description
Fixes two NVCC 13.3 (`cudafe++` / EDG front-end) parse regressions that
break the Linux CUDA build of ONNX Runtime. Both are host-side parser
bugs in the CUDA 13.3 toolkit that reject valid C++ which compiles fine
on CUDA 13.2 and earlier.
1. **Abseil member alias template.** NVCC 13.3 mis-parses the
qualified-id `IfRRef<...>::AddPtr<Other>` used inside abseil's
`insert_or_assign` / `try_emplace` macros, failing with `using template
type parameter ... after 'typename'`. A new patch introduces a top-level
alias template `IfRRefAddPtr<T, Other>` and routes the macros through
it. Because it stays an alias template, substitution remains in the
immediate context, so forming a pointer-to-reference is still a soft
(SFINAE) failure rather than a hard error — the original behavior is
preserved.
2. **CCCL global-qualified partial specializations.**
`<cub/device/device_transform.cuh>` and
`<cub/device/dispatch/tuning/tuning_transform.cuh>` declare `struct
::cuda::proclaims_copyable_arguments<...> : ::cuda::std::true_type {};`
at global scope, which NVCC 13.3 rejects with `global qualification of
class name is invalid before ':' token`. Since the affected headers ship
inside the (often read-only) CUDA toolkit, the build now generates
corrected copies — rewriting the specializations into namespace-reopened
form (`_CCCL_BEGIN_NAMESPACE_CUDA ... _CCCL_END_NAMESPACE_CUDA`) — into
the build tree and places that directory ahead of the toolkit CCCL
include path. The transform is a no-op on toolkits that do not contain
the offending pattern, so it is safe to keep enabled across CUDA
versions.
### Summary of changes
| File | Change |
|------|--------|
| `cmake/patches/abseil/absl_cuda13_member_template.patch` | New patch
adding the `IfRRefAddPtr` alias template and rewriting the abseil
container macros to use it. |
| `cmake/vcpkg-ports/abseil/absl_cuda13_member_template.patch` | Same
patch copied into the vcpkg overlay port (vcpkg looks for patches in the
port directory). |
| `cmake/vcpkg-ports/abseil/portfile.cmake` | Add the new patch to the
abseil overlay port `PATCHES` list. |
| `cmake/external/abseil-cpp.cmake` | Apply the new patch in the
non-vcpkg FetchContent path (both Windows and non-Windows branches). |
| `cmake/onnxruntime_providers_cuda.cmake` | Add
`ort_cuda13_patch_cccl_header()` and, for CUDA >= 13.0, generate fixed
CCCL headers into the build tree and prepend that directory to the CUDA
include path. |
### Motivation and Context
The CUDA 13.3 toolkit introduced `cudafe++` parser regressions that
reject valid template code accepted by CUDA 13.2 and earlier, so the
Linux CUDA build fails before producing any libraries. These workarounds
restore the build on CUDA 13.3 while remaining no-ops on toolkits
without the regressions, so existing CUDA versions are unaffected.
- Related upstream issue:
https://github.com/abseil/abseil-cpp/issues/2075
### How was this tested?
- Full Linux build with CUDA 13.3 + cuDNN 9.23
(`CMAKE_CUDA_ARCHITECTURES="89;90"`, Release) completes successfully and
produces the `onnxruntime_gpu` wheel; the two previously-failing
translation units (`bias_softmax_impl.cu` and `moe_kernel.cu`) now
compile.
- The CMake-generated CCCL headers were verified byte-identical to a
manually-fixed reference that compiles the affected files with `exit 0`.