vllm
[Feature][Quantization] MXFP4 support for MOE models
#17888
Merged

Commits
  • MXFP4
    fxmarty-amd committed 230 days ago
  • Separate moe to another PR
    fxmarty-amd committed 230 days ago
  • lint
    fxmarty-amd committed 230 days ago
  • wip
    fxmarty-amd committed 225 days ago
  • large moe support
    fxmarty-amd committed 225 days ago
  • use kernels
    fxmarty-amd committed 225 days ago
  • use dynamic quant kernel for moe activation
    fxmarty-amd committed 225 days ago
  • add kernel/non-kernel branches for mxfp4
    fxmarty-amd committed 225 days ago
  • wip
    fxmarty-amd committed 225 days ago
  • large moe support
    fxmarty-amd committed 225 days ago
  • set VLLM_QUARK_MXFP4_Q_DQ_QDQ_IMPLEM to 'hip', 'triton' or 'torch' to select the q/dq/qdq implem for mxfp4
    fxmarty-amd committed 225 days ago
  • fix
    fxmarty-amd committed 225 days ago
  • Move all kernels into Quark (#3)
    fxmarty-amd committed 225 days ago
  • rebase fixup
    fxmarty-amd committed 225 days ago
  • Merge branch 'main' into mxfp4_moe
    fxmarty-amd committed 225 days ago
  • style
    fxmarty-amd committed 225 days ago
  • fix style
    fxmarty-amd committed 225 days ago
  • add test and documentation
    fxmarty-amd committed 225 days ago
  • style
    fxmarty-amd committed 225 days ago
  • Merge branch 'main' into mxfp4_moe
    fxmarty-amd committed 223 days ago
  • fix conflicts
    fxmarty-amd committed 223 days ago
  • style
    fxmarty-amd committed 223 days ago
  • style bis
    fxmarty-amd committed 223 days ago
  • Merge branch 'main' into mxfp4_moe
    fxmarty-amd committed 219 days ago
  • add accuracy test
    fxmarty-amd committed 219 days ago
  • style
    fxmarty-amd committed 219 days ago
  • fix test
    fxmarty-amd committed 219 days ago
  • address review comments
    fxmarty-amd committed 218 days ago
  • Merge branch 'main' into mxfp4_moe
    fxmarty-amd committed 211 days ago
  • merge fixes
    fxmarty-amd committed 211 days ago
  • style
    fxmarty-amd committed 211 days ago
  • skip tests if not enough gpus
    fxmarty-amd committed 211 days ago
  • typo
    fxmarty-amd committed 211 days ago
  • Merge branch 'main' into mxfp4_moe
    fxmarty-amd committed 203 days ago
  • Merge branch 'main' into mxfp4_moe
    fxmarty-amd committed 191 days ago
  • add missing args in examples
    fxmarty-amd committed 190 days ago
  • remove VLLM_QUARK_EMU_MEM_OPT, always keeps mxfp4 weights in low precision
    fxmarty-amd committed 190 days ago
  • use emulate=True for mxfp4 gemm on cdna4 until real kernels are integrated
    fxmarty-amd committed 190 days ago
  • add slow/non-optimized reference torch mxfp4 quant and qdq implementation
    fxmarty-amd committed 190 days ago
  • style
    fxmarty-amd committed 190 days ago
  • style 2
    fxmarty-amd committed 190 days ago
  • Merge branch 'main' into mxfp4_moe
    fxmarty-amd committed 180 days ago
  • style
    fxmarty-amd committed 180 days ago
  • fix tests
    fxmarty-amd committed 180 days ago
  • linting
    fxmarty-amd committed 180 days ago
  • Merge branch 'main' into mxfp4_moe
    fxmarty-amd committed 169 days ago
  • fix updates with main and address comments
    fxmarty-amd committed 169 days ago
  • linting
    fxmarty-amd committed 169 days ago
  • linting 2
    fxmarty-amd committed 169 days ago
  • update doc
    fxmarty-amd committed 169 days ago
  • linting 3
    fxmarty-amd committed 169 days ago
  • import fused_experts lazily
    fxmarty-amd committed 168 days ago
  • pass activation arg
    fxmarty-amd committed 168 days ago
  • remove per_channel_quant=True
    fxmarty-amd committed 168 days ago
Loading