onnxruntime
3e0e2084 - Fix CUDA ReduceSum crash on empty tensors with explicit axes

Commit

17 days ago

Fix CUDA ReduceSum crash on empty tensors with explicit axes Remove the overly strict assertion that rejected reducing along a zero-sized dimension even with explicit axes. Reducing axis K of shape {N, 0} with keepdims=false produces shape {N} filled with the identity value (0 for sum), which is mathematically valid. The CPU implementation already handles this case via check_and_reduce_empty_set_input(). The CUDA path now allows PrepareForReduce to succeed, and ReduceComputeCore (line 369) already handles input_count==0 correctly. This fixes CUDA inference for models with dynamic KV cache where past_sequence_length=0 during prefill (e.g., Gemma4 via ORT GenAI). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchu@microsoft.com>

References

#28353 - Fix CUDA ReduceSum erroring out on empty tensors with explicit axes

Author

justinchuby

Committer

justinchuby

Parents

b8f21f1e

onnxruntime 3e0e2084 - Fix CUDA ReduceSum crash on empty tensors with explicit axes

onnxruntime
3e0e2084 - Fix CUDA ReduceSum crash on empty tensors with explicit axes