transformers
2d01eb3f
- handle dense + sparse mixing for mellum model in SP plan
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
2 days ago
handle dense + sparse mixing for mellum model in SP plan
References
distributed
#46269 - DIstributed branch base
#46290 - Tp param level
#46292 - sp + ep training / tp + ep inference
Author
3outeille
Parents
f914e301
Loading