[webgpu] support intel subgroup matrix on matmul_nbits (#24898)
The patch enables intel subgroup matrix on matmul_bits operator, and
temporarily supports it on vulkan backend and xe-2lpg arch, we will
extend the functions on more subgroup matrix configs and platforms.
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->