vllm
[Perf] Fuse Zero Initializer for FP8 DeepGemm Block Quant Kernel
#39547
Merged

[Perf] Fuse Zero Initializer for FP8 DeepGemm Block Quant Kernel #39547

wzhao18
wzhao18 wzhao18 marked this pull request as ready for review 33 days ago
wzhao18 wzhao18 requested a review from mgoin mgoin 33 days ago
wzhao18 wzhao18 requested a review from tlrmchlsmth tlrmchlsmth 33 days ago
wzhao18 wzhao18 requested a review from WoosukKwon WoosukKwon 33 days ago
wzhao18 wzhao18 requested a review from yewentao256 yewentao256 33 days ago
gemini-code-assist
gemini-code-assist commented on 2026-04-10
mgoin
mgoin approved these changes on 2026-04-10
mgoin mgoin added performance
mgoin mgoin added ready
mgoin mgoin added nvidia
wzhao18 fuse fp8 packed quant zero init into quant kernel
200a0d4c
wzhao18 fixup
bdf7e5e8
wzhao18 fixup
99c8ab30
wzhao18 Update comments
434e1c46
wzhao18 Simplify tests
1db6ccf8
wzhao18 Fix sync threads divergence
0b680696
wzhao18 Update comments
6d515b45
wzhao18 Explicitly set device=cpu in tests
6f90e9f1
wzhao18 test with poisoned scales
e77a6515
wzhao18 wzhao18 force pushed to e77a6515 33 days ago
vllm-bot vllm-bot merged 59b2f7b6 into main 32 days ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone