Validate g_idx values in MatMulNBits to prevent OOB read (#27582)
### Description
In `Dequantize4BitsKernelReOrder` (CPU and CUDA EP), values from the
`g_idx` tensor are used directly as array indices into the `scales` and
`zero_points` buffers without bounds checking. This PR adds value-range
validation and tests for the `g_idx` input tensor in the `MatMulNBits`
operator.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
---------
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>