onnxruntime
3e0e2084 - Fix CUDA ReduceSum crash on empty tensors with explicit axes

Commit
17 days ago
Fix CUDA ReduceSum crash on empty tensors with explicit axes Remove the overly strict assertion that rejected reducing along a zero-sized dimension even with explicit axes. Reducing axis K of shape {N, 0} with keepdims=false produces shape {N} filled with the identity value (0 for sum), which is mathematically valid. The CPU implementation already handles this case via check_and_reduce_empty_set_input(). The CUDA path now allows PrepareForReduce to succeed, and ReduceComputeCore (line 369) already handles input_count==0 correctly. This fixes CUDA inference for models with dynamic KV cache where past_sequence_length=0 during prefill (e.g., Gemma4 via ORT GenAI). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchu@microsoft.com>
Author
Committer
Parents
Loading