onnxruntime
01a56ce6 - Fix GPT-2 no-past attention fusion for transformers >= 4.27 (#27449)

Commit

5 days ago

Fix GPT-2 no-past attention fusion for transformers >= 4.27 (#27449) ## Summary - Fix `FusionGptAttentionNoPast` mask pattern matching to support both `torch.uint8` (old) and `torch.bool` (new) causal masks - Add synthetic ONNX graph generator and unit test for the no-past attention fusion path ## Motivation Fixes #16453 In `transformers >= 4.27` (Feb 2023), the causal attention mask dtype changed from `torch.uint8` to `torch.bool` ([commit](https://github.com/huggingface/transformers/commit/c51dc4f92755c67a83f3fc8a0bd6b3e64df199e4)). This removed a `Cast` node from the exported ONNX graph. `FusionGptAttentionNoPast.fuse()` hardcoded `Cast` as the first element in `match_parent_path`, causing the mask path match to fail silently for all modern transformers exports. The result: **zero Attention nodes fused** for any GPT-2 model exported without past state. The sibling class `FusionGptAttention` (with-past) was already fixed to handle both patterns using `match_parent_paths` (plural). This PR applies the same approach to the no-past variant. ## Changes ### `fusion_gpt_attention_no_past.py` - Replace `match_parent_path` with `match_parent_paths` for the Where-based mask path (lines 187-201), offering both the Cast-prefixed pattern (old transformers) and Cast-less pattern (transformers >= 4.27) - Remove stale TODO comment that noted the fusion "stopped working" ### `gpt2_model_generator.py` - Add `create_gpt2_attention_no_past()` function that builds a synthetic GPT-2 no-past attention graph with the Where-based mask pattern - Supports `add_cast` parameter to test both mask variants ### `test_attention_fusion.py` - Add `test_gpt2_attention_no_past_fusion()` that verifies an Attention node is fused for all combinations of `add_cast` and `switch_add_inputs` ## Test Plan - [x] New test `test_gpt2_attention_no_past_fusion` passes (4 variants: with/without Cast × normal/switched Add inputs) - [x] All existing attention fusion tests pass (10/10) - [x] Lint clean on modified files (`lintrunner` reports no issues for new code) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

References

#27449 - Fix GPT-2 no-past attention fusion for transformers >= 4.27

Author

Rishi-Dave

Parents

5c3f5449

onnxruntime 01a56ce6 - Fix GPT-2 no-past attention fusion for transformers >= 4.27 (#27449)

onnxruntime
01a56ce6 - Fix GPT-2 no-past attention fusion for transformers >= 4.27 (#27449)