transformers
Fix GPT2 attention scaling ignored in SDPA/FlashAttention
#44397
Merged

Fix GPT2 attention scaling ignored in SDPA/FlashAttention #44397

weiguangli-io
Fix GPT2 attention scaling config ignored in SDPA/FlashAttention back…
f5f29cba
Sync scaling fix to DecisionTransformerGPT2Attention (Copied from GPT…
3c3a9e1d
vasqu
vasqu commented on 2026-03-02
Address review: use self.scaling in _upcast_and_reordered_attn, impro…
04f9ba9f
weiguangli-io
vasqu
vasqu commented on 2026-03-03
Address review: refactor eager_attention_forward to use scaling param…
802cd589
vasqu
vasqu approved these changes on 2026-03-03
vasqu
github-actions
github-actions
Add issue reference to regression tests
309bb7fc
weiguangli-io
vasqu Merge branch 'main' into codex/transformers-44380-gpt2-sdpa-scaling
d1dc716a
vasqu vasqu enabled auto-merge (squash) 10 days ago
vasqu Merge branch 'main' into codex/transformers-44380-gpt2-sdpa-scaling
ca3206a9
github-actions
vasqu Merge branch 'main' into codex/transformers-44380-gpt2-sdpa-scaling
81269efa
vasqu
vasqu vasqu merged 8757098b into main 9 days ago
vasqu

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone