[Perf] Fuse Zero Initializer for FP8 DeepGemm Block Quant Kernel #39547
wzhao18
marked this pull request as ready for review 33 days ago
mgoin
approved these changes
on 2026-04-10
fuse fp8 packed quant zero init into quant kernel
200a0d4c
fixup
bdf7e5e8
fixup
99c8ab30
Update comments
434e1c46
Simplify tests
1db6ccf8
Fix sync threads divergence
0b680696
Update comments
6d515b45
Explicitly set device=cpu in tests
6f90e9f1
test with poisoned scales
e77a6515
wzhao18
force pushed
to
e77a6515
33 days ago
vllm-bot
merged
59b2f7b6
into main 32 days ago
Assignees
No one assigned
Labels
performance
ready
nvidia
Login to write a write a comment.
Login via GitHub