vllm
[Feature][Quantization] MXFP4 support for MOE models
#17888
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
54
Changes
View On
GitHub
Commits
MXFP4
fxmarty-amd
committed
230 days ago
Separate moe to another PR
fxmarty-amd
committed
230 days ago
lint
fxmarty-amd
committed
230 days ago
wip
fxmarty-amd
committed
225 days ago
large moe support
fxmarty-amd
committed
225 days ago
use kernels
fxmarty-amd
committed
225 days ago
use dynamic quant kernel for moe activation
fxmarty-amd
committed
225 days ago
add kernel/non-kernel branches for mxfp4
fxmarty-amd
committed
225 days ago
wip
fxmarty-amd
committed
225 days ago
large moe support
fxmarty-amd
committed
225 days ago
set VLLM_QUARK_MXFP4_Q_DQ_QDQ_IMPLEM to 'hip', 'triton' or 'torch' to select the q/dq/qdq implem for mxfp4
fxmarty-amd
committed
225 days ago
fix
fxmarty-amd
committed
225 days ago
Move all kernels into Quark (#3)
fxmarty-amd
committed
225 days ago
rebase fixup
fxmarty-amd
committed
225 days ago
Merge branch 'main' into mxfp4_moe
fxmarty-amd
committed
225 days ago
style
fxmarty-amd
committed
225 days ago
fix style
fxmarty-amd
committed
225 days ago
add test and documentation
fxmarty-amd
committed
225 days ago
style
fxmarty-amd
committed
225 days ago
Merge branch 'main' into mxfp4_moe
fxmarty-amd
committed
223 days ago
fix conflicts
fxmarty-amd
committed
223 days ago
style
fxmarty-amd
committed
223 days ago
style bis
fxmarty-amd
committed
223 days ago
Merge branch 'main' into mxfp4_moe
fxmarty-amd
committed
219 days ago
add accuracy test
fxmarty-amd
committed
219 days ago
style
fxmarty-amd
committed
219 days ago
fix test
fxmarty-amd
committed
219 days ago
address review comments
fxmarty-amd
committed
218 days ago
Merge branch 'main' into mxfp4_moe
fxmarty-amd
committed
211 days ago
merge fixes
fxmarty-amd
committed
211 days ago
style
fxmarty-amd
committed
211 days ago
skip tests if not enough gpus
fxmarty-amd
committed
211 days ago
typo
fxmarty-amd
committed
211 days ago
Merge branch 'main' into mxfp4_moe
fxmarty-amd
committed
203 days ago
Merge branch 'main' into mxfp4_moe
fxmarty-amd
committed
191 days ago
add missing args in examples
fxmarty-amd
committed
190 days ago
remove VLLM_QUARK_EMU_MEM_OPT, always keeps mxfp4 weights in low precision
fxmarty-amd
committed
190 days ago
use emulate=True for mxfp4 gemm on cdna4 until real kernels are integrated
fxmarty-amd
committed
190 days ago
add slow/non-optimized reference torch mxfp4 quant and qdq implementation
fxmarty-amd
committed
190 days ago
style
fxmarty-amd
committed
190 days ago
style 2
fxmarty-amd
committed
190 days ago
Merge branch 'main' into mxfp4_moe
fxmarty-amd
committed
180 days ago
style
fxmarty-amd
committed
180 days ago
fix tests
fxmarty-amd
committed
180 days ago
linting
fxmarty-amd
committed
180 days ago
Merge branch 'main' into mxfp4_moe
fxmarty-amd
committed
169 days ago
fix updates with main and address comments
fxmarty-amd
committed
169 days ago
linting
fxmarty-amd
committed
169 days ago
linting 2
fxmarty-amd
committed
169 days ago
update doc
fxmarty-amd
committed
169 days ago
linting 3
fxmarty-amd
committed
169 days ago
import fused_experts lazily
fxmarty-amd
committed
168 days ago
pass activation arg
fxmarty-amd
committed
168 days ago
remove per_channel_quant=True
fxmarty-amd
committed
168 days ago
Loading