batched and grouped experts implementations (#42697)
* meo implementation
* support more MoEs
* tests
* add comments
* add grouped_mm support
* typing act_fn and adding stride 16 note
* style
* fix dbrx config
* fix config test
* add licence and better stride conditions
* comment
* no need to pad tesnors to 16 byte strides if we made sure our tiny testing models have 16 byte aligned weights
* use a class decorator with a registration interface
* remove line
* remove unnecessary
* register config with the decorator
* fix redundant
* reduce changes some more
* fix
* fix
* import from integrations
* remove empty lines
* use histc instead of bincount
* fix cpu histc not supporting long
* docs
* added benchmark to docs
* add to from_pretrained's docstring
* make grouped_mm the deafault when possible
* Update docs/source/en/experts_interface.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/experts_interface.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/experts_interface.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/experts_interface.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/experts_interface.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/experts_interface.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/experts_interface.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/experts_interface.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/experts_interface.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/experts_interface.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/experts_interface.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/experts_interface.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/experts_interface.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Apply suggestion from @stevhliu
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Apply suggestion from @stevhliu
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* make qwen3 vl moe inherit its experts and sparse moe blocks from qwen3 moe, making it use experts implementation
* create _supports_grouped_mm flag and use it for testing
* fix copies
* better grouped mm checks
* fix model size failure
* better docs
* get rid of class property _supports_grouped_mm
* add method calling checks and fix models that didn't have experts
* fix copies
* fix
* fix
* more cleanup
* clean
* document compilation behaviour
* docs
* fix new moe after merge
* fix the new ernie 4.5 vl moe testing
* support fullgraph automatic compilation for MoEs
* fix lazy initialization
* disable fullgraph for granitemoe and jetmoe because of topk gating
* avoid implicit fallback in experts implementation and only do it when auto-compiling
* style
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>