onnxruntime
1e80c291 - Validate seqlens_k against cos_cache bounds in GroupQueryAttention to… (#28277)

Commit
6 days ago
Validate seqlens_k against cos_cache bounds in GroupQueryAttention to… (#28277) ### Description Validate `seqlens_k` values against `cos_cache.shape[0]` in `GroupQueryAttention::Compute()` when `do_rotary` is enabled, to prevent out-of-bounds reads in the rotary embedding lookup. ### Root Cause `CheckRotaryCaches()` validates `cos_cache.shape[0] >= total_sequence_length`, but runtime position IDs are derived from `seqlens_k` (a separate, per-batch input). An attacker can set `total_sequence_length` small enough to pass the guard while setting `seqlens_k[b]` far beyond `cos_cache.shape[0]`, causing `position_id = seqlens_k[b]` to index out of bounds into the cos/sin cache. The resulting heap bytes are used as rotation values and propagate into the inference output. ### Fix Add an explicit bounds check in `Compute()` that rejects any `seqlens_k[b] >= cos_cache.shape[0]` before position IDs are computed. This is defense-in-depth alongside the existing `RunRotaryEmbedding` position_ids validation added in #27597. ### Security - **Impact:** Heap OOB read (CWE-125) — adjacent heap memory leaks into inference output via cos/sin rotation values. - **Attack vector:** Any GQA-based LLM serving endpoint (Llama, Phi, Mistral) that accepts `seqlens_k` as an inference input. No model modification required. ### Testing Verified that crafted inputs with `seqlens_k` exceeding `cos_cache` dimensions now return `INVALID_ARGUMENT` instead of silently producing results containing leaked heap data.
Author
Parents
Loading