onnxruntime
a2b4546c - [CUDA] Support SwiGlu in MoE and qMoE (#25530)

Commit
216 days ago
[CUDA] Support SwiGlu in MoE and qMoE (#25530) ### Description This implements the SwiGLU activation for MoE and qMoE. The activation is corresponding to https://github.com/triton-lang/triton/blob/main/python/triton_kernels/triton_kernels/swiglu.py. Also update test_parity_moe.py to enable test for qMoE in CI pipelines. ### Motivation and Context This is naive implementation of the activation. Since the activation will reduce each row length to half, we cannot directly use epilogue. Current implementations need an extra buffer to run SwiGLU kernel. In the future, we might take a look at other alternatives that does not need extra buffer.
Author
Parents
Loading