onnxruntime
582c947b - Fix NonZero CUDA kernel to check kernel launch errors via cudaGetLastError()

Commit
31 days ago
Fix NonZero CUDA kernel to check kernel launch errors via cudaGetLastError() NonZeroCountEachBlock and NonZeroOutputPositions unconditionally returned cudaSuccess after CUDA kernel launches. This swallowed any launch errors and left them in the CUDA runtime error state, where subsequent CUB DeviceScan calls picked them up as confusing cudaErrorInvalidDevice (101) errors. Replace return cudaSuccess with return cudaGetLastError() to properly detect and propagate kernel launch failures, matching the pattern used by other CUDA kernel wrappers in the codebase. Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/1c0b0b59-00b3-481b-af23-4aa8989035fd Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
Author
Parents
Loading