DeepSpeed
87f1979d - feat: Add Triton-optimized Muon optimizer implementation

Commit
47 days ago
feat: Add Triton-optimized Muon optimizer implementation - Implement triton_muon.py with fallback to PyTorch when Triton is not available - Add smart selection between Triton and PyTorch implementations based on matrix size - Maintain backward compatibility with existing code - Add performance-aware matrix size detection for optimal kernel selection Signed-off-by: Guokai Ma <guokai.ma@gmail.com>
Author
Parents
Loading