vllm
[NVFP4][Perf] Tune NVFP4 input quant kernel for small batch size
#30897
Merged

[NVFP4][Perf] Tune NVFP4 input quant kernel for small batch size #30897

mgoin
mgoin Tune scaled_fp4_quant for small M
c18802f5
chatgpt-codex-connector
mergify mergify added performance
mgoin mgoin changed the title [NVFP4][Perf] Tune scaled_fp4_quant for small M [NVFP4][Perf] Tune `scaled_fp4_quant` for small M 143 days ago
gemini-code-assist
gemini-code-assist commented on 2025-12-17
mgoin Fix import
c3fdb023
mgoin Optimize cvt_quant_to_fp4_get_sf_out_offset
238d6556
mgoin Optimize cvt_quant_to_fp4_get_sf_out_offset
7647027b
mgoin mgoin requested a review from LucasWilkinson LucasWilkinson 143 days ago
mgoin mgoin requested a review from pavanimajety pavanimajety 143 days ago
mgoin mgoin requested a review from alexm-redhat alexm-redhat 143 days ago
mgoin mgoin requested a review from robertgshaw2-redhat robertgshaw2-redhat 143 days ago
mgoin mgoin changed the title [NVFP4][Perf] Tune `scaled_fp4_quant` for small M [NVFP4][Perf] Tune NVFP4 input quant kernel for small batch size 143 days ago
mgoin mgoin added quantization
mgoin mgoin added ready
mgoin mgoin added nvidia
mgoin Merge branch 'main' into nvfp4-quant-tune-small-m
02aad69c
pavanimajety
pavanimajety commented on 2025-12-18
pavanimajety
pavanimajety commented on 2025-12-18
pavanimajety
pavanimajety approved these changes on 2025-12-18
robertgshaw2-redhat robertgshaw2-redhat removed quantization
vllm-bot vllm-bot merged 06d49028 into main 139 days ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone