vllm
[Feature][Quantization] MXFP4 support for MOE models
#17888
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
54
Changes
View On
GitHub
[Feature][Quantization] MXFP4 support for MOE models
#17888
simon-mo
merged 54 commits into
vllm-project:main
from
fxmarty-amd:mxfp4_moe
MXFP4
73f7ce1b
Separate moe to another PR
b8596ca2
lint
951d5de4
fxmarty-amd
requested a review
from
mgoin
219 days ago
fxmarty-amd
requested a review
from
robertgshaw2-redhat
219 days ago
fxmarty-amd
requested a review
from
tlrmchlsmth
219 days ago
mergify
added
needs-rebase
wip
e6e73b3b
large moe support
24a9f4e0
use kernels
97a3fb64
use dynamic quant kernel for moe activation
35e02c2b
add kernel/non-kernel branches for mxfp4
7a0c064b
wip
886ab84b
large moe support
e665798b
set VLLM_QUARK_MXFP4_Q_DQ_QDQ_IMPLEM to 'hip', 'triton' or 'torch' to…
7623bc85
fix
09fafb67
Move all kernels into Quark (#3)
fadffba2
rebase fixup
415b8d95
fxmarty-amd
force pushed
from
cff73cd5
to
415b8d95
215 days ago
Merge branch 'main' into mxfp4_moe
2ab5c242
mergify
removed
needs-rebase
style
469e79cb
fxmarty-amd
force pushed
to
469e79cb
215 days ago
fix style
ed3969f1
add test and documentation
e53016ea
fxmarty-amd
force pushed
to
e53016ea
214 days ago
mergify
added
documentation
style
4edf7844
fxmarty-amd
force pushed
to
4edf7844
214 days ago
mergify
added
needs-rebase
Merge branch 'main' into mxfp4_moe
e003d100
mergify
removed
needs-rebase
fix conflicts
66792463
style
ee805f8e
fxmarty-amd
force pushed
to
ee805f8e
213 days ago
style bis
5fcc61a9
Merge branch 'main' into mxfp4_moe
16d370ba
add accuracy test
74a07ac7
style
d47af236
fix test
5c7e12d5
mgoin
commented on 2025-05-19
address review comments
e8087df5
fxmarty-amd
force pushed
to
e8087df5
208 days ago
fxmarty-amd
requested a review
from
mgoin
208 days ago
mergify
added
needs-rebase
Merge branch 'main' into mxfp4_moe
28a3d143
fxmarty-amd
requested a review
from
hmellor
201 days ago
mergify
removed
needs-rebase
merge fixes
f7ce3904
style
360b03f1
skip tests if not enough gpus
877b7d18
typo
efe7c3cc
mgoin
added
quantization
mgoin
added
ready
mergify
added
needs-rebase
Merge branch 'main' into mxfp4_moe
7511ad6f
mergify
removed
needs-rebase
Merge branch 'main' into mxfp4_moe
31264d8d
mgoin
commented on 2025-06-16
add missing args in examples
b90a85c3
fxmarty-amd
requested a review
from
WoosukKwon
180 days ago
remove VLLM_QUARK_EMU_MEM_OPT, always keeps mxfp4 weights in low prec…
1cd9359a
use emulate=True for mxfp4 gemm on cdna4 until real kernels are integ…
26061489
add slow/non-optimized reference torch mxfp4 quant and qdq implementa…
42d97884
style
3da6d24b
style 2
65e9250d
mgoin
approved these changes on 2025-06-19
Merge branch 'main' into mxfp4_moe
978b7402
style
d65eaf3b
fxmarty-amd
force pushed
to
d65eaf3b
170 days ago
fix tests
d31db577
linting
e03aeee4
bnellnm
commented on 2025-06-27
bnellnm
commented on 2025-06-27
mergify
added
needs-rebase
Merge branch 'main' into mxfp4_moe
68bb0750
fix updates with main and address comments
a07cd03c
linting
2e7fcd7d
linting 2
cb0292d2
mergify
removed
needs-rebase
update doc
fa460a3c
linting 3
e570709c
mgoin
commented on 2025-07-09
import fused_experts lazily
90a01bbd
pass activation arg
7334abcf
remove per_channel_quant=True
4ffff1df
simon-mo
merged
332d4cb1
into main
157 days ago
Login to write a write a comment.
Login via GitHub
Reviewers
mgoin
bnellnm
robertgshaw2-redhat
tlrmchlsmth
hmellor
WoosukKwon
Assignees
No one assigned
Labels
documentation
quantization
ready
Milestone
No milestone
Login to write a write a comment.
Login via GitHub