Add validation of position_ids in RotaryEmbedding operators (#27597)
## Description
Fix out-of-bounds read in the RotaryEmbedding operator when
user-provided `position_ids` values exceed the cos/sin cache bounds
(`max_sequence_length`).
### Problem
When `position_ids` contains values that are negative or >=
`max_sequence_length`, the kernel computes `cache_offset = position_id *
half_rotary_embedding_dim` and reads out-of-bounds from `cos_cache` /
`sin_cache`. This can cause undefined behavior (incorrect results,
crashes, or memory corruption).
### Fix
**CPU (`rotary_embedding.cc`):**
- Added upfront validation of all `position_ids` values before the
parallel computation loop. Returns an `INVALID_ARGUMENT` error if any
value is out of range `[0, max_sequence_length)`.
- Validation is only applied when `position_ids_format != 0` (i.e., when
position_ids are explicitly provided). When `position_ids` is not
provided (format 0), the cache is shaped `(B, S, H/2)` and the index `b
* S + s` is always in-bounds by construction.
**CUDA (`rotary_embedding_impl.cu`):**
- Plumbed the previously-unused `max_sequence_length` parameter through
to the kernel.
- Added a bounds check inside the `position_ids_format != 0` branch.
Out-of-bounds position IDs cause the kernel to pass through the input
unchanged (errors cannot be propagated from GPU kernels).
- The bounds check is scoped to the `position_ids_format != 0` branch
only. When format is 0 (no position_ids), the cache is `(B*S, H/2)` and
`b_s_index = b * S + s` is deterministically valid — applying the check
unconditionally would incorrectly reject all batches beyond the first
since `max_sequence_length == sequence_length` in that case.
### Tests
Added three CPU test cases for the ONNX domain `RotaryEmbedding` op:
- `RotaryEmbedding_PositionIds_ExceedsMaxSeqLen` — position_id far
exceeding cache size
- `RotaryEmbedding_PositionIds_Negative` — negative position_id
- `RotaryEmbedding_PositionIds_OOB_InBatch` — OOB position_id in a
multi-batch, multi-sequence scenario
### Motivation and Context
Security hardening — prevent out-of-bounds memory access from untrusted
model inputs.