[CPU] Improve QMoE kernel #25822
Fixes CPU kernel
adcdca72
Additional fixes
541e08b8
Optimizations
764b55a1
Fix pipelines
27d05d5b
Address comments
c1500b8d
Address comments
85268ad7
Revert "Address comments"
37c0858b
Fix the memory optimization issue
85874ff1
Fix race condition
1c9f927b
Fix unused variables
f7746829
Optimizations
728d7a88
Fix
c2386f5b
Debugging alot
a6da84db
Remove comments
e2c5d689
Some modifications
4c905ae8
FC1 fixed
c3647589
Working fix
ed52e130
Remove print statements
1ea12bca
Low diff values
f5be0cec
Rebase with main
e450158b
Fix
471bb8b1
Fix tests
b015c3de
Fix pipelines
2b674658
refactoring
f85a9f16
format
1bcb20d0
parallel optimization
25aa31bf
fix build
ca180b66
eliminate the intermediate memcpy after SwiGLU
6a484862
parallelize the routing logic
c369322e
format
73a437c4
refactoring output
94a27297
Fix pipelines
5de1b217
Update cpu tests to use same python reference implementation as cuda …
27c1c055
apsonawane
force pushed
from
a39580c5
to
27c1c055
211 days ago
Fix tests
81e6713a
Remove failing CPU test
d11f51cf
Add legacy shape check back
a7978f88
apsonawane
force pushed
from
c9cdf689
to
a7978f88
210 days ago
tianleiwu
approved these changes
on 2025-08-26
apsonawane
marked this pull request as ready for review 210 days ago
apsonawane
deleted the asonawane/qmoe branch 210 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub