transformers
Fix GPT2 attention scaling ignored in SDPA/FlashAttention
#44397

Merged

Fix GPT2 attention scaling ignored in SDPA/FlashAttention #44397

vasqu merged 8 commits into huggingface:main from weiguangli-io:codex/transformers-44380-gpt2-sdpa-scaling

Fix GPT2 attention scaling config ignored in SDPA/FlashAttention back…

f5f29cba

Sync scaling fix to DecisionTransformerGPT2Attention (Copied from GPT…

3c3a9e1d

vasqu commented on 2026-03-02

Address review: use self.scaling in _upcast_and_reordered_attn, impro…

04f9ba9f

vasqu commented on 2026-03-03

Address review: refactor eager_attention_forward to use scaling param…

802cd589

vasqu approved these changes on 2026-03-03

Add issue reference to regression tests

309bb7fc

Merge branch 'main' into codex/transformers-44380-gpt2-sdpa-scaling

d1dc716a

vasqu enabled auto-merge (squash) 10 days ago

Merge branch 'main' into codex/transformers-44380-gpt2-sdpa-scaling

ca3206a9

Merge branch 'main' into codex/transformers-44380-gpt2-sdpa-scaling

81269efa

vasqu merged 8757098b into main 9 days ago

Reviewers

vasqu

Assignees

No one assigned

Labels

None yet

Milestone

No milestone