PR #17888 [Feature][Quantization] MXFP4 support for MOE models

MXFP4

fxmarty-amd committed 230 days ago

Separate moe to another PR

fxmarty-amd committed 230 days ago

lint

fxmarty-amd committed 230 days ago

wip

fxmarty-amd committed 225 days ago

large moe support

fxmarty-amd committed 225 days ago

use kernels

fxmarty-amd committed 225 days ago

use dynamic quant kernel for moe activation

fxmarty-amd committed 225 days ago

add kernel/non-kernel branches for mxfp4

fxmarty-amd committed 225 days ago

wip

fxmarty-amd committed 225 days ago

large moe support

fxmarty-amd committed 225 days ago

set VLLM_QUARK_MXFP4_Q_DQ_QDQ_IMPLEM to 'hip', 'triton' or 'torch' to select the q/dq/qdq implem for mxfp4

fxmarty-amd committed 225 days ago

fix

fxmarty-amd committed 225 days ago

Move all kernels into Quark (#3)

fxmarty-amd committed 225 days ago

rebase fixup

fxmarty-amd committed 225 days ago

Merge branch 'main' into mxfp4_moe

fxmarty-amd committed 225 days ago

style

fxmarty-amd committed 225 days ago

fix style

fxmarty-amd committed 225 days ago

add test and documentation

fxmarty-amd committed 225 days ago

style

fxmarty-amd committed 225 days ago

Merge branch 'main' into mxfp4_moe

fxmarty-amd committed 223 days ago

fix conflicts

fxmarty-amd committed 223 days ago

style

fxmarty-amd committed 223 days ago

style bis

fxmarty-amd committed 223 days ago

Merge branch 'main' into mxfp4_moe

fxmarty-amd committed 219 days ago

add accuracy test

fxmarty-amd committed 219 days ago

style

fxmarty-amd committed 219 days ago

fix test

fxmarty-amd committed 219 days ago

address review comments

fxmarty-amd committed 218 days ago

Merge branch 'main' into mxfp4_moe

fxmarty-amd committed 211 days ago

merge fixes

fxmarty-amd committed 211 days ago

style

fxmarty-amd committed 211 days ago

skip tests if not enough gpus

fxmarty-amd committed 211 days ago

typo

fxmarty-amd committed 211 days ago

Merge branch 'main' into mxfp4_moe

fxmarty-amd committed 203 days ago

Merge branch 'main' into mxfp4_moe

fxmarty-amd committed 191 days ago

add missing args in examples

fxmarty-amd committed 190 days ago

remove VLLM_QUARK_EMU_MEM_OPT, always keeps mxfp4 weights in low precision

fxmarty-amd committed 190 days ago

use emulate=True for mxfp4 gemm on cdna4 until real kernels are integrated

fxmarty-amd committed 190 days ago

add slow/non-optimized reference torch mxfp4 quant and qdq implementation

fxmarty-amd committed 190 days ago

style

fxmarty-amd committed 190 days ago

style 2

fxmarty-amd committed 190 days ago

Merge branch 'main' into mxfp4_moe

fxmarty-amd committed 180 days ago

style

fxmarty-amd committed 180 days ago

fix tests

fxmarty-amd committed 180 days ago

linting

fxmarty-amd committed 180 days ago

Merge branch 'main' into mxfp4_moe

fxmarty-amd committed 169 days ago

fix updates with main and address comments

fxmarty-amd committed 169 days ago

linting

fxmarty-amd committed 169 days ago

linting 2

fxmarty-amd committed 169 days ago

update doc

fxmarty-amd committed 169 days ago

linting 3

fxmarty-amd committed 169 days ago

import fused_experts lazily

fxmarty-amd committed 168 days ago

pass activation arg

fxmarty-amd committed 168 days ago

remove per_channel_quant=True

fxmarty-amd committed 168 days ago

vllm [Feature][Quantization] MXFP4 support for MOE models #17888 Merged

vllm
[Feature][Quantization] MXFP4 support for MOE models
#17888

Merged