onnxruntime
37a3b51c - Fix bounds in WhisperDecoderSubgraph::CreateInitialFeeds initial feeds (#29239)

Commit
6 days ago
Fix bounds in WhisperDecoderSubgraph::CreateInitialFeeds initial feeds (#29239) ### Description `WhisperDecoderSubgraph::CreateInitialFeeds` constructed the decoder initial feeds using a single value that mixed a **byte count** with an **element count**. The total size was computed as `cur_len * batch_beam_size * sizeof(int)` (bytes) and then reused as: - the element count for the int32 staging buffer (`MakeUniquePtr<int>`), and - the element count for the `gsl::span<int>` source/destination passed to the device copy. Because the `input_ids` tensor is allocated for exactly `batch_beam_size * cur_len` int32 elements, the spans claimed 4x the real extent, so the device copy ran past the end of the buffer. The per-beam `memcpy` also used the same combined value as its length instead of a single sequence's byte size. This mirrors the correct T5 sibling (`subgraph_t5_decoder.cc`), which separates the element count (used for the spans/staging allocation) from the per-sequence byte count (used for the `memcpy`). ### Changes - `subgraph_whisper_decoder.cc`: `total_size` is now the element count `cur_len * batch_beam_size`; introduced `sequence_bytes = cur_len * sizeof(int32_t)` for the per-beam `memcpy`. The staging buffer and spans use `int32_t` consistently to match the `int32_t` tensors/sequences. - Added regression test `BeamSearchTest.DummyWhisperWithSequenceInputIds` (CPU, and CUDA under `USE_CUDA`) exercising the `use_sequence_as_input_ids` path, with a deterministic dummy model and its generator script. The test validates both the `sequences` and `scores` outputs. ### Related bool-tensor normalization fixes While exercising the Whisper path, bool tensors copied from raw data could hold non-canonical byte values (anything non-zero rather than strictly `{0, 1}`), causing provider-dependent behavior. To keep the fix self-contained, the following normalization changes are included: - `tensorprotoutils.cc`: `UnpackTensor<bool>` normalizes raw-data bytes to `{0, 1}` (with a `static_assert(sizeof(bool) == 1)` guarding the byte-wise loop). - `compress_impl.cu` (CUDA `Compress`): the prefix-sum sizing predicate normalizes bool bytes to `{0, 1}` so the output sizing agrees with the element-selection truthiness check. Since bool initializers are now normalized on unpack, the remaining exposure is runtime-produced bool condition tensors. - Added `CompressTest.Compress_cuda_non_canonical_bool_condition` (under `USE_CUDA`), which feeds a raw `0xFF` condition byte through a session-level run (`OpTester` normalizes bool inputs and so cannot reproduce this) and asserts the Compress output is sized by truthiness rather than by the sign-extended byte value. ### Motivation The decoder shares one implementation file across CPU/CUDA/ROCm, so this single change covers all execution providers. The previous behavior could overrun the staging/feed buffers for models that drive the sequence-as-input-ids decoder path. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Author
Parents
Loading