vllm
[Feature][Quantization] MXFP4 support for MOE models
#17888
Merged

[Feature][Quantization] MXFP4 support for MOE models #17888

simon-mo merged 54 commits into vllm-project:main from fxmarty-amd:mxfp4_moe
fxmarty-amd
fxmarty-amd MXFP4
73f7ce1b
BowenBao Separate moe to another PR
b8596ca2
BowenBao lint
951d5de4
fxmarty-amd fxmarty-amd requested a review from mgoin mgoin 219 days ago
fxmarty-amd fxmarty-amd requested a review from robertgshaw2-redhat robertgshaw2-redhat 219 days ago
fxmarty-amd fxmarty-amd requested a review from tlrmchlsmth tlrmchlsmth 219 days ago
github-actions
DarkLight1337
mergify
mergify mergify added needs-rebase
fxmarty-amd wip
e6e73b3b
BowenBao large moe support
24a9f4e0
fxmarty-amd use kernels
97a3fb64
BowenBao use dynamic quant kernel for moe activation
35e02c2b
fxmarty-amd add kernel/non-kernel branches for mxfp4
7a0c064b
fxmarty-amd wip
886ab84b
BowenBao large moe support
e665798b
fxmarty-amd set VLLM_QUARK_MXFP4_Q_DQ_QDQ_IMPLEM to 'hip', 'triton' or 'torch' to…
7623bc85
fxmarty-amd fix
09fafb67
BowenBao Move all kernels into Quark (#3)
fadffba2
fxmarty-amd rebase fixup
415b8d95
fxmarty-amd fxmarty-amd force pushed from cff73cd5 to 415b8d95 215 days ago
fxmarty-amd Merge branch 'main' into mxfp4_moe
2ab5c242
mergify mergify removed needs-rebase
fxmarty-amd style
469e79cb
fxmarty-amd fxmarty-amd force pushed to 469e79cb 215 days ago
fxmarty-amd fix style
ed3969f1
fxmarty-amd add test and documentation
e53016ea
fxmarty-amd fxmarty-amd force pushed to e53016ea 214 days ago
mergify mergify added documentation
fxmarty-amd style
4edf7844
fxmarty-amd fxmarty-amd force pushed to 4edf7844 214 days ago
BowenBao
mergify
mergify mergify added needs-rebase
fxmarty-amd Merge branch 'main' into mxfp4_moe
e003d100
mergify mergify removed needs-rebase
fxmarty-amd fix conflicts
66792463
fxmarty-amd style
ee805f8e
fxmarty-amd fxmarty-amd force pushed to ee805f8e 213 days ago
fxmarty-amd style bis
5fcc61a9
fxmarty-amd Merge branch 'main' into mxfp4_moe
16d370ba
fxmarty-amd add accuracy test
74a07ac7
fxmarty-amd style
d47af236
fxmarty-amd
fxmarty-amd fix test
5c7e12d5
mgoin
mgoin commented on 2025-05-19
fxmarty-amd address review comments
e8087df5
fxmarty-amd fxmarty-amd force pushed to e8087df5 208 days ago
fxmarty-amd fxmarty-amd requested a review from mgoin mgoin 208 days ago
mergify
mergify mergify added needs-rebase
fxmarty-amd Merge branch 'main' into mxfp4_moe
28a3d143
fxmarty-amd fxmarty-amd requested a review from hmellor hmellor 201 days ago
mergify mergify removed needs-rebase
fxmarty-amd merge fixes
f7ce3904
fxmarty-amd style
360b03f1
fxmarty-amd skip tests if not enough gpus
877b7d18
fxmarty-amd
fxmarty-amd typo
efe7c3cc
mgoin mgoin added quantization
mgoin mgoin added ready
mergify
mergify mergify added needs-rebase
fxmarty-amd Merge branch 'main' into mxfp4_moe
7511ad6f
mergify mergify removed needs-rebase
fxmarty-amd Merge branch 'main' into mxfp4_moe
31264d8d
fxmarty-amd
mgoin
mgoin
mgoin
mgoin commented on 2025-06-16
fxmarty-amd add missing args in examples
b90a85c3
fxmarty-amd fxmarty-amd requested a review from WoosukKwon WoosukKwon 180 days ago
fxmarty-amd remove VLLM_QUARK_EMU_MEM_OPT, always keeps mxfp4 weights in low prec…
1cd9359a
fxmarty-amd use emulate=True for mxfp4 gemm on cdna4 until real kernels are integ…
26061489
fxmarty-amd add slow/non-optimized reference torch mxfp4 quant and qdq implementa…
42d97884
fxmarty-amd style
3da6d24b
fxmarty-amd style 2
65e9250d
fxmarty-amd
mgoin
mgoin approved these changes on 2025-06-19
fxmarty-amd
fxmarty-amd Merge branch 'main' into mxfp4_moe
978b7402
fxmarty-amd style
d65eaf3b
fxmarty-amd fxmarty-amd force pushed to d65eaf3b 170 days ago
fxmarty-amd
fxmarty-amd fix tests
d31db577
fxmarty-amd linting
e03aeee4
bnellnm
bnellnm commented on 2025-06-27
bnellnm
bnellnm commented on 2025-06-27
mergify
mergify mergify added needs-rebase
fxmarty-amd Merge branch 'main' into mxfp4_moe
68bb0750
fxmarty-amd fix updates with main and address comments
a07cd03c
fxmarty-amd linting
2e7fcd7d
fxmarty-amd linting 2
cb0292d2
mergify mergify removed needs-rebase
fxmarty-amd
fxmarty-amd update doc
fa460a3c
fxmarty-amd linting 3
e570709c
fxmarty-amd
mgoin
mgoin
mgoin commented on 2025-07-09
fxmarty-amd import fused_experts lazily
90a01bbd
fxmarty-amd pass activation arg
7334abcf
fxmarty-amd remove per_channel_quant=True
4ffff1df
fxmarty-amd
simon-mo simon-mo merged 332d4cb1 into main 157 days ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone