vllm
[Perf] Optimize cutlass fp8 scaled mm bypassing padding, 20% kernel performance improvement
#43706
Merged

[Perf] Optimize cutlass fp8 scaled mm bypassing padding, 20% kernel performance improvement #43706

yewentao256 merged 7 commits into main from wentao-optimize-cutlassfp8
yewentao256
yewentao256 optimize cutlass fp8
1bfd0971
yewentao256 yewentao256 added ready
mergify mergify added nvidia
lgeiger
lgeiger commented on 2026-05-26
yewentao256 update
6db67c12
Isotr0py
Isotr0py approved these changes on 2026-05-28
yewentao256 fix unit test
18ac0c69
mergify[bot] Merge branch 'main' into wentao-optimize-cutlassfp8
8026361f
yewentao256 Merge branch 'main' into wentao-optimize-cutlassfp8
846915cb
yewentao256 Merge branch 'main' into wentao-optimize-cutlassfp8
afbd0071
yewentao256 fix ci
83c09494
yewentao256 yewentao256 merged 985c97a6 into main 27 days ago
yewentao256 yewentao256 deleted the wentao-optimize-cutlassfp8 branch 27 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone