[NVFP4][Perf] Tune NVFP4 input quant kernel for small batch size #30897
Tune scaled_fp4_quant for small M
c18802f5
mgoin
changed the title [NVFP4][Perf] Tune scaled_fp4_quant for small M [NVFP4][Perf] Tune `scaled_fp4_quant` for small M 143 days ago
Fix import
c3fdb023
Optimize cvt_quant_to_fp4_get_sf_out_offset
238d6556
Optimize cvt_quant_to_fp4_get_sf_out_offset
7647027b
mgoin
changed the title [NVFP4][Perf] Tune `scaled_fp4_quant` for small M [NVFP4][Perf] Tune NVFP4 input quant kernel for small batch size 143 days ago
Merge branch 'main' into nvfp4-quant-tune-small-m
02aad69c
vllm-bot
merged
06d49028
into main 139 days ago
Assignees
No one assigned
Labels
performance
ready
nvidia
Login to write a write a comment.
Login via GitHub