[js/webgpu] Support Reshape/Shape 21+ on jsep (#21871)
### Description
<!-- Describe your changes. -->
#21618
With this PR, the cross device copying (`MemcpyToHost`) can totally be
removed for model `wav2vec2`. And the overall time becomes 48ms from
604ms.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->