Fix Inverse kernel rank underflow before indexing trailing dims (#28400)
### Description
`Inverse::Compute` (CPU and CUDA) read `dims[num_dim - 2]` and
`dims[num_dim - 1]` before validating `num_dim >= 2`. Because `num_dim`
is `size_t`, scalar or 1D inputs silently underflow the subtraction and
produce out-of-bounds indices.
- **`contrib_ops/cpu/inverse.cc`** — add `ORT_RETURN_IF_NOT(num_dim >=
2, ...)` immediately after shape retrieval, before any dimension
indexing.
- **`contrib_ops/cuda/inverse.cc`** — same guard in `ComputeInternal`.
- **`test/contrib_ops/inverse_test.cc`** — add `scalar_input_fails` and
`one_dim_input_fails` test cases.
### Motivation and Context
Without the guard, passing a 0-D or 1-D tensor to the Inverse op causes
`size_t` underflow on `num_dim - 2`, reading garbage memory before any
validation occurs. The fix is a defense-in-depth kernel-level check that
mirrors the existing ONNX shape-inference rejection.
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
Co-authored-by: Tianlei Wu <tlwu@microsoft.com>