transformers
0642963b - batched and grouped experts implementations (#42697)

Commit
91 days ago
batched and grouped experts implementations (#42697) * meo implementation * support more MoEs * tests * add comments * add grouped_mm support * typing act_fn and adding stride 16 note * style * fix dbrx config * fix config test * add licence and better stride conditions * comment * no need to pad tesnors to 16 byte strides if we made sure our tiny testing models have 16 byte aligned weights * use a class decorator with a registration interface * remove line * remove unnecessary * register config with the decorator * fix redundant * reduce changes some more * fix * fix * import from integrations * remove empty lines * use histc instead of bincount * fix cpu histc not supporting long * docs * added benchmark to docs * add to from_pretrained's docstring * make grouped_mm the deafault when possible * Update docs/source/en/experts_interface.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/experts_interface.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/experts_interface.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/experts_interface.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/experts_interface.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/experts_interface.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/experts_interface.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/experts_interface.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/experts_interface.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/experts_interface.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/experts_interface.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/experts_interface.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/experts_interface.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Apply suggestion from @stevhliu Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Apply suggestion from @stevhliu Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * make qwen3 vl moe inherit its experts and sparse moe blocks from qwen3 moe, making it use experts implementation * create _supports_grouped_mm flag and use it for testing * fix copies * better grouped mm checks * fix model size failure * better docs * get rid of class property _supports_grouped_mm * add method calling checks and fix models that didn't have experts * fix copies * fix * fix * more cleanup * clean * document compilation behaviour * docs * fix new moe after merge * fix the new ernie 4.5 vl moe testing * support fullgraph automatic compilation for MoEs * fix lazy initialization * disable fullgraph for granitemoe and jetmoe because of topk gating * avoid implicit fallback in experts implementation and only do it when auto-compiling * style --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Parents
Loading