webgpu / nbitmm support for bias and weight_index (#26392)
add support for bias and weight_index, move subgroup_matrix_matmul_nbits
to template and make program callable from other ops.
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>