onnxruntime
cd05ef40 - Fix int32 overflow in CUDA Cast and UnaryElementWise kernels for tensors with >2^31 elements (#28386)

Commit

64 days ago

Fix int32 overflow in CUDA Cast and UnaryElementWise kernels for tensors with >2^31 elements (#28386) - [x] Fix `unary_elementwise_impl.cuh`: Change `CUDA_LONG` to `int64_t` for `N` parameter and loop index in `_UnaryElementWise` kernel, and fix `blocksPerGrid` calculation - [x] Fix `cast_op.cu`: Change `CUDA_LONG` to `int64_t` for `N` parameter and loop index in `CastKernelStd`, `CastKernelSat`, and `CudaCastPairwiseKernel` kernels, and remove `static_cast<int>` truncation - [x] Use `size_t` for `pair_count` in CudaCastPairwise to avoid double conversion (review feedback) - [x] Rename test to `CastKernelCorrectness_ModerateSize` and add `CastKernel_Int64IndexArithmetic_NoOverflow` host-side test (review feedback) - [x] Merge from main to resolve conflicts with Float8E8M0 tests --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com> Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com> Co-authored-by: Tianlei Wu <tlwu@microsoft.com>

References

#28386 - Fix int32 overflow in CUDA Cast and UnaryElementWise kernels for tensors with >2^31 elements

Author

Copilot

Parents

2d9f35fc

onnxruntime cd05ef40 - Fix int32 overflow in CUDA Cast and UnaryElementWise kernels for tensors with >2^31 elements (#28386)

onnxruntime
cd05ef40 - Fix int32 overflow in CUDA Cast and UnaryElementWise kernels for tensors with >2^31 elements (#28386)