onnxruntime
7c56fa83 - Add seqlens_k bounds validation in GroupQueryAttention to prevent GEMM OOB (#28031)

Commit

24 days ago

Add seqlens_k bounds validation in GroupQueryAttention to prevent GEMM OOB (#28031) ### Description Validate seqlens_k tensor values in the CPU GroupQueryAttention operator before they are used as GEMM dimensions. Without this check, a crafted model can supply negative or oversized seqlens_k values that cause out-of-bounds reads in the K/V present cache buffers. Fixes https://portal.microsofticm.com/imp/v5/incidents/details/31000000559235/summary ### Changes - **group_query_attention.cc**: Add validation loop in `Compute()` before any seqlens_k access: - `seqlens_k[b] >= 0` (prevents unsigned wraparound in `static_cast<size_t>`) - `seqlens_k[b] + 1 <= present_kv_seqlen` (prevents GEMM reading past K/V buffer) - For non-first-prompt: `seqlens_k[b] + 1 >= sequence_length` (prevents underflow in `past_seqlen = total_seqlen - sequence_length`) - **group_query_attention_helper.h**: Fix seqlens_k shape validation (`&&` to `||`) so wrong-length tensors are correctly rejected - **Tests**: 4 regression tests covering negative, oversized, multi-batch, and boundary-valid seqlens_k values ### Motivation and Context MSRC case 108962: A crafted model can set seqlens_k values that, when cast to `size_t` and used as GEMM N dimension, cause heap OOB reads from the present K/V cache buffers. --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

References

#28031 - Add seqlens_k bounds validation in GroupQueryAttention to prevent GEMM OOB

Author

vraspar

Parents

fb13eb3e

onnxruntime 7c56fa83 - Add seqlens_k bounds validation in GroupQueryAttention to prevent GEMM OOB (#28031)

onnxruntime
7c56fa83 - Add seqlens_k bounds validation in GroupQueryAttention to prevent GEMM OOB (#28031)