Validate seqlens_k against cos_cache bounds in GroupQueryAttention to… (#28277)
### Description
Validate `seqlens_k` values against `cos_cache.shape[0]` in
`GroupQueryAttention::Compute()` when `do_rotary` is enabled, to prevent
out-of-bounds reads in the rotary embedding lookup.
### Root Cause
`CheckRotaryCaches()` validates `cos_cache.shape[0] >=
total_sequence_length`, but runtime position IDs are derived from
`seqlens_k` (a separate, per-batch input). An attacker can set
`total_sequence_length` small enough to pass the guard while setting
`seqlens_k[b]` far beyond `cos_cache.shape[0]`, causing `position_id =
seqlens_k[b]` to index out of bounds into the cos/sin cache. The
resulting heap bytes are used as rotation values and propagate into the
inference output.
### Fix
Add an explicit bounds check in `Compute()` that rejects any
`seqlens_k[b] >= cos_cache.shape[0]` before position IDs are computed.
This is defense-in-depth alongside the existing `RunRotaryEmbedding`
position_ids validation added in #27597.
### Security
- **Impact:** Heap OOB read (CWE-125) — adjacent heap memory leaks into
inference output via cos/sin rotation values.
- **Attack vector:** Any GQA-based LLM serving endpoint (Llama, Phi,
Mistral) that accepts `seqlens_k` as an inference input. No model
modification required.
### Testing
Verified that crafted inputs with `seqlens_k` exceeding `cos_cache`
dimensions now return `INVALID_ARGUMENT` instead of silently producing
results containing leaked heap data.