PR #43227 GptOss experts implementation

GptOss experts implementation #43227

ArthurZucker merged 35 commits into main from gpt-oss-experts-impl

experts impl gpt oss

2aff4a88

IlyasMoutawwakil marked this pull request as draft 161 days ago

no need to transpose dequantized experts

9958efba

skip test_reverse_loading_mapping

b23e1ffa

fix custom gating

e28f1555

Merge branch 'main' into gpt-oss-experts-impl

e57d0a87

revert transposition and simply support transposed experts to avoid m…

be08fe48

style

e1dba4d3

don't rely on weight shapes as they can be square matrices

0261a467

IlyasMoutawwakil commented on 2026-01-13

IlyasMoutawwakil marked this pull request as ready for review 160 days ago

IlyasMoutawwakil requested a review from

vasqu 159 days ago

no need to relaod

5bd25c75

fallback to eager

846adcad

IlyasMoutawwakil commented on 2026-01-14

IlyasMoutawwakil requested a review from

ArthurZucker 159 days ago

ArthurZucker commented on 2026-01-14

Update src/transformers/models/gpt_oss/modeling_gpt_oss.py

b1a71a79

vasqu commented on 2026-01-14

fix

9dbed89b

force 16 bytes alignmenet during weight loading

2f3fd11c

simplify logic

dd377e19

quantization conversions should be applied first

52e07786

IlyasMoutawwakil commented on 2026-01-15

avoid baddbmm as it is less performant / less optimizable by max-auto…

1c491124

no need for logger

4b0323ce

IlyasMoutawwakil requested a review from

ArthurZucker 158 days ago

IlyasMoutawwakil requested a review from

vasqu 158 days ago

Merge branch 'main' into gpt-oss-experts-impl

aa34996f

add comment explaining limitation

f094c319

standarize operations and only reshape when needed

221f9bda

Merge branch 'main' into gpt-oss-experts-impl

944afb5c

fixup conversion and test

1fc01dc3

vasqu commented on 2026-01-16

Update src/transformers/conversion_mapping.py

d8207138

force alignment docstring

71fdb18c

move default apply gate

e852cbb0

offsets

d698dcb4

vasqu approved these changes on 2026-01-16

Merge branch 'main' into gpt-oss-experts-impl

5c2ca3cc

add docs and make kernel_config optional

d6631bba

use reshapes as they are equivalent to views when memory is contiguous

4f7226d8

ArthurZucker approved these changes on 2026-01-19

fix and better notes

21173033

reshapes instead of views

944a0eca

Merge branch 'main' into gpt-oss-experts-impl

1a0ea125

keep model saving and reloading in grouped_mm test to catch misalignm…

16e65366

Merge branch 'main' into gpt-oss-experts-impl

75ab2759

Merge branch 'main' into gpt-oss-experts-impl

711a652b

IlyasMoutawwakil force pushed from 36ff79a1 to 711a652b 151 days ago

ArthurZucker approved these changes on 2026-01-22

ArthurZucker merged 2d4d8fe4 into main 151 days ago

ArthurZucker deleted the gpt-oss-experts-impl branch 151 days ago

Reviewers

ArthurZucker

vasqu

Cyrilvallez

Assignees

No one assigned

Labels

None yet

Milestone

No milestone

transformers GptOss experts implementation #43227 Merged

GptOss experts implementation #43227

transformers
GptOss experts implementation
#43227

Merged