[js/webgpu] allows a ProgramInfo's RunData to use zero sized output (#19614)
### Description
This PR allows zero-sized output.
To make the implementation simple, it does not support partial
zero-sized tensor. Which means, either all outputs are zero-sized, or an
error will be reported.
added 2 tests:
- op test of `Add` with input T[2,0] T[2,1], and
- test_split_zero_size_splits