DeepSpeed
47b3fb5e
- Fixed the problem of loading universal checkpoint error in multi-machine mode. (#7601)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
71 days ago
Fixed the problem of loading universal checkpoint error in multi-machine mode. (#7601) In a multi-machine environment, loading the stage3 universal checkpoint will produce incorrect results, causing the loss to increase abnormally.
References
#7601 - Fixed the problem of loading universal checkpoint error in multi-machine mode.
Author
zhengchenyu
Parents
66c70312
Loading