Validate token_id bounds in NGramRepeatBlock to prevent OOB write (#28039)
### Description
In `NGramRepeatBlock` (CPU and CUDA EP), token values from the
`input_ids` tensor are used directly as array indices into the `scores`
output buffer without adequate bounds checking. The CPU path only
checked `token_id < vocab_size` (missing lower bound), and the CUDA
kernel had no bounds checks at all. A crafted model with negative token
IDs can write at attacker-controlled negative offsets, causing heap
corruption or SIGSEGV.
Fixes
https://portal.microsofticm.com/imp/v5/incidents/details/31000000558069/summary
### Changes
- **ngram_repeat_block.h** (CPU): Replace `ORT_ENFORCE(token_id <
vocab_size)` with full `[0, vocab_size)` bounds check returning
`INVALID_ARGUMENT` Status via atomic error flag (avoids `abort()` under
`ORT_NO_EXCEPTIONS`)
- **ngram_repeat_block_impl.cu** (CUDA): Add `CUDA_KERNEL_ASSERT` +
bounds guard with skip for release safety
### Tests
2 regression tests: negative token_id, token_id >= vocab_size (CPU only,
CUDA EP excluded to avoid debug assert context corruption).
---------
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>