onnxruntime
Add Distributed Checkpointing support
#3639
Merged

Commits
  • Change naming of moments to Moment_x_<weight_name>
    ashbhandare committed 5 years ago
  • Modify zero test to hit bert like scenario.
    ashbhandare committed 5 years ago
  • Revert "Modify zero test to hit bert like scenario."
    ashbhandare committed 5 years ago
  • Add checkpointing code and zero checkpoint aggregation
    ashbhandare committed 5 years ago
  • Correct aggregation for LAMB, cleanup
    ashbhandare committed 5 years ago
  • Add simple checkpointing test
    ashbhandare committed 5 years ago
  • Add test for zero checkpoint aggregation
    ashbhandare committed 5 years ago
  • Fix tests
    ashbhandare committed 5 years ago
  • fix test
    ashbhandare committed 5 years ago
  • Review changes
    ashbhandare committed 5 years ago
  • Fix test after review comment fix
    ashbhandare committed 5 years ago
  • Fix API, test
    ashbhandare committed 5 years ago
  • Fix test after API change
    ashbhandare committed 5 years ago
  • Decouple save load from ORTTrainer
    ashbhandare committed 5 years ago
  • Add flag to not break checkpointing with ORTModel'
    ashbhandare committed 5 years ago
Loading