[Perf] Optimize cutlass fp8 scaled mm bypassing padding, 20% kernel performance improvement #43706
optimize cutlass fp8
1bfd0971
update
6db67c12
Isotr0py
approved these changes
on 2026-05-28
fix unit test
18ac0c69
Merge branch 'main' into wentao-optimize-cutlassfp8
8026361f
Merge branch 'main' into wentao-optimize-cutlassfp8
846915cb
Merge branch 'main' into wentao-optimize-cutlassfp8
afbd0071
fix ci
83c09494
yewentao256
deleted the wentao-optimize-cutlassfp8 branch 27 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub