Address review: consistent CPU/CUDA empty-set behavior and extended tests
- Fix ReduceMean comment: clarify 0 fill is for CPU/CUDA consistency
(ReduceAggregatorMean inherits ReduceAggregatorSum::fill_for_empty_set)
- Add ReduceMin_EmptyTensor_DefaultAxes test (reduce-all, scalar output)
- Add ReduceMin_EmptyTensor_DefaultAxes_KeepDims test
- Add ReduceMean_EmptyTensor_DefaultAxes test
- Update test header comment: ReduceMean → 0 (not NaN)
- All tests are EP-independent (no #ifdef USE_CUDA guards)
Signed-off-by: Justin Chu <justinchu@microsoft.com>