[GPU] MoE optimization (#32469)

Commit

207 days ago

[GPU] MoE optimization (#32469) ### Details: - Objective : To skip unused experts in MOE the pattern. - To the end, added following primitives optimized for MOE: - moe_mask_gen : Provides info for experts to be used - moe_gemm : Runs only needed experts - moe_gather : Gathers inputs for each expert of moe_gemm (originate from https://github.com/openvinotoolkit/openvino/pull/32317) - moe_scatter_reduce : Restores MOE results to original layout (originate from https://github.com/openvinotoolkit/openvino/pull/32454) - swiglu extension for clamp : Extended swiglu for gpt-oss pattern (from https://github.com/openvinotoolkit/openvino/pull/32365) ### Tickets: - CVS-175117, CVS-173490, CVS-174726,CVS-174518 CVS-174589 --------- Co-authored-by: Lee, Chon Ming <chon.ming.lee@intel.com> Co-authored-by: chenhu-wang <chenhu.wang@intel.com>

References

#32469 - [GPU] MoE optimization

Author

yeonbok

Parents

d8e40429

openvino 3010f953 - [GPU] MoE optimization (#32469)

openvino
3010f953 - [GPU] MoE optimization (#32469)