Refactor ModelOptFp8MoEMethod to use modular kernels
Apply the modular kernel refactoring pattern from PR #30825 to
ModelOptFp8MoEMethod. This enables cleaner kernel dispatch and better
support for tensor parallel configurations.
Changes:
- The select_gemm_impl() method already existed and correctly returns
CUTLASS experts via select_cutlass_fp8_gemm_impl()
- Simplified apply() method to only handle TENSORRT_LLM path, which
doesn't use modular kernel
- Removed unreachable CUTLASS and fallback paths from apply() since
they are now handled by FusedMoEModularMethod
- Removed unused import flashinfer_cutlass_moe_fp8
The modular kernel infrastructure now handles CUTLASS and TP/EP backends
by wrapping the quant method with FusedMoEModularMethod, which calls
select_gemm_impl() to get the appropriate experts implementation.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>