onnxruntime
d4b01e43 - Fix CPU Attention causal mask alignment (#29050)

Commit
6 days ago
Fix CPU Attention causal mask alignment (#29050) ## Summary - align CPU ONNX Attention causal masking with upper-left behavior for q_len=1, kv_len>1, no past - preserve the existing `nonpad_kv_seqlen` / TensorScatter single-query causal behavior - update Python attention reference causal mask to model ONNX upper-left alignment with an explicit past offset - add a regression test for issue #29020 Fixes #29020 ## Validation - `python -m py_compile onnxruntime/test/python/transformers/test_onnx_attention/common.py onnxruntime/test/python/transformers/test_onnx_attention/test_mha.py onnxruntime/test/python/transformers/test_onnx_attention/test_gqa.py onnxruntime/test/python/transformers/test_onnx_attention/test_tensorscatter_attention.py` - `git diff --check` Notes: - `pytest onnxruntime/test/python/transformers/test_onnx_attention/test_tensorscatter_attention.py -k "cpu_fp32 and causal" -q` could not run locally because this Python environment does not have `onnx` / `onnxruntime` installed. - After the latest follow-up commit, an incremental rebuild of `onnxruntime_provider_test` was attempted but failed in MSBuild before compiling this change due to a local environment issue: duplicate `Path` / `PATH` environment keys when launching `CL.exe`.
Author
Parents
Loading