GptOss experts implementation (#43227)

Commit

175 days ago

GptOss experts implementation (#43227) * experts impl gpt oss * no need to transpose dequantized experts * skip test_reverse_loading_mapping * fix custom gating * revert transposition and simply support transposed experts to avoid modifying eager * style * don't rely on weight shapes as they can be square matrices * no need to relaod * fallback to eager * Update src/transformers/models/gpt_oss/modeling_gpt_oss.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix * force 16 bytes alignmenet during weight loading * simplify logic * quantization conversions should be applied first * avoid baddbmm as it is less performant / less optimizable by max-autotune * no need for logger * add comment explaining limitation * standarize operations and only reshape when needed * fixup conversion and test * Update src/transformers/conversion_mapping.py Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> * force alignment docstring * move default apply gate * offsets * add docs and make kernel_config optional * use reshapes as they are equivalent to views when memory is contiguous * fix and better notes * reshapes instead of views * keep model saving and reloading in grouped_mm test to catch misalignment issues --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: vasqu <antonprogamer@gmail.com> Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

References

#43227 - GptOss experts implementation

Author

IlyasMoutawwakil

Parents

eff263cd

transformers 2d4d8fe4 - GptOss experts implementation (#43227)

transformers
2d4d8fe4 - GptOss experts implementation (#43227)