transformers
Fix GPT2 attention scaling ignored in SDPA/FlashAttention
#44397
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
8
Changes
View On
GitHub
Fix GPT2 attention scaling ignored in SDPA/FlashAttention
#44397
vasqu
merged 8 commits into
huggingface:main
from
weiguangli-io:codex/transformers-44380-gpt2-sdpa-scaling
Fix GPT2 attention scaling config ignored in SDPA/FlashAttention back…
f5f29cba
Sync scaling fix to DecisionTransformerGPT2Attention (Copied from GPT…
3c3a9e1d
vasqu
commented on 2026-03-02
Address review: use self.scaling in _upcast_and_reordered_attn, impro…
04f9ba9f
vasqu
commented on 2026-03-03
Address review: refactor eager_attention_forward to use scaling param…
802cd589
vasqu
approved these changes on 2026-03-03
Add issue reference to regression tests
309bb7fc
Merge branch 'main' into codex/transformers-44380-gpt2-sdpa-scaling
d1dc716a
vasqu
enabled auto-merge (squash)
10 days ago
Merge branch 'main' into codex/transformers-44380-gpt2-sdpa-scaling
ca3206a9
Merge branch 'main' into codex/transformers-44380-gpt2-sdpa-scaling
81269efa
vasqu
merged
8757098b
into main
9 days ago
Login to write a write a comment.
Login via GitHub
Reviewers
vasqu
Assignees
No one assigned
Labels
None yet
Milestone
No milestone
Login to write a write a comment.
Login via GitHub