onnxruntime
9c2b0c31 - Use CUDART_VERSION reduction compatibility in GQA attention (#28296)

Commit

7 days ago

Use CUDART_VERSION reduction compatibility in GQA attention (#28296) ### Description Update `/home/runner/work/onnxruntime/onnxruntime/onnxruntime/contrib_ops/cuda/bert/gqa_unfused_attention.cu` to match the existing CUDA attention compatibility pattern used elsewhere in the repo. - Replace the local reduction functors with the established `CUDART_VERSION >= 12090` guards. - Use `::cuda::maximum()` and `::cuda::std::plus()` for CUDA 12.9+. - Keep `cub::Max()` and `cub::Sum()` as the fallback for older toolkits. ### Motivation and Context This keeps the GQA unfused attention kernel consistent with nearby CUDA attention code and avoids the CUDA 12.9+ deprecation issue around the old CUB reduction functors while preserving compatibility with older CUDA toolkits. Validation: - `git diff --check` - Code review validation: no comments - CodeQL validation: no analyzable language changes detected --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>

References

#28296 - Use CUDART_VERSION reduction compatibility in GQA attention

Author

Copilot

Parents

6f23504a

onnxruntime 9c2b0c31 - Use CUDART_VERSION reduction compatibility in GQA attention (#28296)

onnxruntime
9c2b0c31 - Use CUDART_VERSION reduction compatibility in GQA attention (#28296)