vllm
[NVFP4][Perf] Tune NVFP4 input quant kernel for small batch size
#30897

Merged

[NVFP4][Perf] Tune NVFP4 input quant kernel for small batch size #30897

vllm-bot merged 5 commits into vllm-project:main from neuralmagic:nvfp4-quant-tune-small-m

Tune scaled_fp4_quant for small M

c18802f5

mergify added performance

mgoin changed the title ~~[NVFP4][Perf] Tune scaled_fp4_quant for small M~~ [NVFP4][Perf] Tune `scaled_fp4_quant` for small M 143 days ago

gemini-code-assist commented on 2025-12-17

Fix import

c3fdb023

Optimize cvt_quant_to_fp4_get_sf_out_offset

238d6556

Optimize cvt_quant_to_fp4_get_sf_out_offset

7647027b

mgoin requested a review from

LucasWilkinson 143 days ago

mgoin requested a review from

pavanimajety 143 days ago

mgoin requested a review from

alexm-redhat 143 days ago

mgoin requested a review from

robertgshaw2-redhat 143 days ago

mgoin changed the title ~~[NVFP4][Perf] Tune `scaled_fp4_quant` for small M~~ [NVFP4][Perf] Tune NVFP4 input quant kernel for small batch size 143 days ago

mgoin added quantization

mgoin added ready

mgoin added nvidia

Merge branch 'main' into nvfp4-quant-tune-small-m

02aad69c

pavanimajety commented on 2025-12-18

pavanimajety approved these changes on 2025-12-18

robertgshaw2-redhat removed quantization

vllm-bot merged 06d49028 into main 139 days ago

Reviewers

pavanimajety

gemini-code-assist

LucasWilkinson

alexm-redhat

robertgshaw2-redhat

Assignees

No one assigned

Labels

performance ready nvidia

Milestone

No milestone

vllm [NVFP4][Perf] Tune NVFP4 input quant kernel for small batch size #30897 Merged

[NVFP4][Perf] Tune NVFP4 input quant kernel for small batch size #30897

vllm
[NVFP4][Perf] Tune NVFP4 input quant kernel for small batch size
#30897

Merged