permute/unpermute kernel for moe optimization #14568
bnellnm
approved these changes
on 2025-04-28
implement moe permute kernel
65e8abe0
implement moe unpermute
45884cf0
update code and fix pre-commit
b1fd42bc
add expert_map support for moe permute
1abc4c23
add expert_map support for moe unpermute
4f29dc0f
fix mismatch and add more test case
42507df4
support align_block_size for contiguous group gemm in deepgemm
6960faaa
for each local valid expert,fill padding row with expert_id in `m_ind…
88168126
add fill_invalid_expert to workaround deepgemm unsupport -1 in m_indices
d453aa7f
update code according bnellnm's comment:
3db8defb
remove arch limit in cmake and add return `token_expert_indices` in a…
c7e58308
fix pre-commit failed
840cd419
1. fix call FusedMoE.select_experts failed
b29bacff
CalebDu
force pushed
to
b29bacff
1 year ago
simon-mo
merged
3e887d2e
into main 1 year ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub