transformers
GptOss experts implementation
#43227
Merged

GptOss experts implementation #43227

ArthurZucker merged 35 commits into main from gpt-oss-experts-impl
IlyasMoutawwakil
IlyasMoutawwakil experts impl gpt oss
2aff4a88
HuggingFaceDocBuilderDev
IlyasMoutawwakil IlyasMoutawwakil marked this pull request as draft 41 days ago
IlyasMoutawwakil no need to transpose dequantized experts
9958efba
IlyasMoutawwakil skip test_reverse_loading_mapping
b23e1ffa
IlyasMoutawwakil fix custom gating
e28f1555
IlyasMoutawwakil
IlyasMoutawwakil Merge branch 'main' into gpt-oss-experts-impl
e57d0a87
IlyasMoutawwakil revert transposition and simply support transposed experts to avoid m…
be08fe48
IlyasMoutawwakil style
e1dba4d3
IlyasMoutawwakil don't rely on weight shapes as they can be square matrices
0261a467
IlyasMoutawwakil
IlyasMoutawwakil commented on 2026-01-13
IlyasMoutawwakil IlyasMoutawwakil marked this pull request as ready for review 39 days ago
IlyasMoutawwakil IlyasMoutawwakil requested a review from vasqu vasqu 39 days ago
IlyasMoutawwakil no need to relaod
5bd25c75
IlyasMoutawwakil fallback to eager
846adcad
IlyasMoutawwakil
IlyasMoutawwakil commented on 2026-01-14
IlyasMoutawwakil IlyasMoutawwakil requested a review from ArthurZucker ArthurZucker 38 days ago
ArthurZucker
ArthurZucker commented on 2026-01-14
IlyasMoutawwakil Update src/transformers/models/gpt_oss/modeling_gpt_oss.py
b1a71a79
vasqu
vasqu commented on 2026-01-14
IlyasMoutawwakil
vasqu
IlyasMoutawwakil
IlyasMoutawwakil fix
9dbed89b
IlyasMoutawwakil force 16 bytes alignmenet during weight loading
2f3fd11c
IlyasMoutawwakil simplify logic
dd377e19
IlyasMoutawwakil quantization conversions should be applied first
52e07786
IlyasMoutawwakil
IlyasMoutawwakil commented on 2026-01-15
IlyasMoutawwakil avoid baddbmm as it is less performant / less optimizable by max-auto…
1c491124
IlyasMoutawwakil no need for logger
4b0323ce
IlyasMoutawwakil IlyasMoutawwakil requested a review from ArthurZucker ArthurZucker 37 days ago
IlyasMoutawwakil IlyasMoutawwakil requested a review from vasqu vasqu 37 days ago
IlyasMoutawwakil Merge branch 'main' into gpt-oss-experts-impl
aa34996f
IlyasMoutawwakil add comment explaining limitation
f094c319
IlyasMoutawwakil standarize operations and only reshape when needed
221f9bda
IlyasMoutawwakil Merge branch 'main' into gpt-oss-experts-impl
944afb5c
vasqu fixup conversion and test
1fc01dc3
vasqu
vasqu commented on 2026-01-16
IlyasMoutawwakil Update src/transformers/conversion_mapping.py
d8207138
IlyasMoutawwakil force alignment docstring
71fdb18c
IlyasMoutawwakil move default apply gate
e852cbb0
IlyasMoutawwakil offsets
d698dcb4
vasqu
vasqu approved these changes on 2026-01-16
IlyasMoutawwakil
vasqu
IlyasMoutawwakil Merge branch 'main' into gpt-oss-experts-impl
5c2ca3cc
IlyasMoutawwakil add docs and make kernel_config optional
d6631bba
IlyasMoutawwakil use reshapes as they are equivalent to views when memory is contiguous
4f7226d8
ArthurZucker
ArthurZucker approved these changes on 2026-01-19
IlyasMoutawwakil fix and better notes
21173033
IlyasMoutawwakil reshapes instead of views
944a0eca
IlyasMoutawwakil
vasqu
IlyasMoutawwakil
IlyasMoutawwakil Merge branch 'main' into gpt-oss-experts-impl
1a0ea125
IlyasMoutawwakil keep model saving and reloading in grouped_mm test to catch misalignm…
16e65366
IlyasMoutawwakil
IlyasMoutawwakil Merge branch 'main' into gpt-oss-experts-impl
75ab2759
IlyasMoutawwakil Merge branch 'main' into gpt-oss-experts-impl
711a652b
github-actions
IlyasMoutawwakil IlyasMoutawwakil force pushed from 36ff79a1 to 711a652b 31 days ago
github-actions
ArthurZucker
ArthurZucker approved these changes on 2026-01-22
ArthurZucker ArthurZucker merged 2d4d8fe4 into main 31 days ago
ArthurZucker ArthurZucker deleted the gpt-oss-experts-impl branch 31 days ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone