DeepSpeed
37011a92
- Reduce tied weight gradients
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
3 years ago
Reduce tied weight gradients
References
#1801 - bf16+pipeline parallelism
Author
tjruwase
Parents
35ea3808
Loading