DeepSpeed
Checkpoint reshaping
#1953
Merged

Commits
  • unit test, remove exception, add notes
    jeffra committed 3 years ago
  • Merge branch 'master' of github.com:microsoft/DeepSpeed into elastic-ckpt-refresh
    tjruwase committed 3 years ago
  • Move param_shapes to model files
    tjruwase committed 3 years ago
  • Remove hard-coded constants
    tjruwase committed 3 years ago
  • Merge branch 'olruwase/relocate_param_shapes' of github.com:microsoft/DeepSpeed into olruwase/elastic-ckpt-refresh
    tjruwase committed 3 years ago
  • Merge branch 'master' into olruwase/relocate_param_shapes
    tjruwase committed 3 years ago
  • Conditioned to zero optimizer
    tjruwase committed 3 years ago
  • Merge branch 'olruwase/relocate_param_shapes' of github.com:microsoft/DeepSpeed into olruwase/relocate_param_shapes
    tjruwase committed 3 years ago
  • Add zero checkpoint merging
    tjruwase committed 3 years ago
  • Merge branch 'olruwase/relocate_param_shapes' of github.com:microsoft/DeepSpeed into olruwase/elastic-ckpt-refresh
    tjruwase committed 3 years ago
  • Merge branch 'master' into olruwase/relocate_param_shapes
    jeffra committed 3 years ago
  • Print checkpoint version
    tjruwase committed 3 years ago
  • Merge branch 'olruwase/relocate_param_shapes' of github.com:microsoft/DeepSpeed into olruwase/relocate_param_shapes
    tjruwase committed 3 years ago
  • Merge with relocate_param_shapes
    tjruwase committed 3 years ago
  • Reshape zero_* ckpt files
    tjruwase committed 3 years ago
  • Merge zero* files contraction
    tjruwase committed 3 years ago
  • Utils for 3D contraction reshaping
    tjruwase committed 3 years ago
  • Rebase
    tjruwase committed 3 years ago
  • Merge branch 'master' into olruwase/elastic-ckpt-refresh
    tjruwase committed 3 years ago
  • Merge branch 'master' into olruwase/elastic-ckpt-refresh
    tjruwase committed 3 years ago
  • Remove bogus import
    tjruwase committed 3 years ago
  • Merge branch 'olruwase/elastic-ckpt-refresh' of github.com:microsoft/DeepSpeed into olruwase/elastic-ckpt-refresh
    tjruwase committed 3 years ago
  • Merge branch 'master' into olruwase/elastic-ckpt-refresh
    tjruwase committed 3 years ago
  • Support bf16_zero ckpts
    tjruwase committed 3 years ago
  • Merge branch 'olruwase/elastic-ckpt-refresh' of github.com:microsoft/DeepSpeed into olruwase/elastic-ckpt-refresh
    tjruwase committed 3 years ago
  • Merge branch 'master' of github.com:microsoft/DeepSpeed into olruwase/elastic-ckpt-refresh
    tjruwase committed 3 years ago
  • Add param slice mappings
    tjruwase committed 3 years ago
  • Merge branch 'master' into olruwase/elastic-ckpt-refresh
    tjruwase committed 3 years ago
  • Load universal checkpoints
    tjruwase committed 3 years ago
  • Merge branch 'olruwase/elastic-ckpt-refresh' of github.com:microsoft/DeepSpeed into olruwase/elastic-ckpt-refresh
    tjruwase committed 3 years ago
  • Per group mappings from Stas
    tjruwase committed 3 years ago
  • Hack to load bf16 zero files
    tjruwase committed 3 years ago
  • Param attributes
    tjruwase committed 3 years ago
  • WIP
    tjruwase committed 3 years ago
  • Merge branch 'master' into olruwase/elastic-ckpt-refresh
    tjruwase committed 3 years ago
  • Fix api bug
    tjruwase committed 3 years ago
  • Merge branch 'olruwase/elastic-ckpt-refresh' of github.com:microsoft/DeepSpeed into olruwase/elastic-ckpt-refresh
    tjruwase committed 3 years ago
  • Update lp with local/remote hp
    tjruwase committed 3 years ago
  • Disable vocab padding handling
    tjruwase committed 3 years ago
  • Update z2 checkpoint
    tjruwase committed 3 years ago
  • Remove debug prints
    tjruwase committed 3 years ago
  • Remove debug prints; Rebase unit test
    tjruwase committed 3 years ago
  • Add reshape assert
    tjruwase committed 3 years ago
  • Padding
    tjruwase committed 3 years ago
  • Typo
    tjruwase committed 3 years ago
  • Catch nonexistent checkpoint path
    tjruwase committed 3 years ago
  • Merge branch 'master' into olruwase/elastic-ckpt-refresh
    tjruwase committed 3 years ago
  • Cleanup
    tjruwase committed 3 years ago
  • Merge branch 'olruwase/elastic-ckpt-refresh' of github.com:microsoft/DeepSpeed into olruwase/elastic-ckpt-refresh
    tjruwase committed 3 years ago
  • Restore checkpoint state comparisons
    tjruwase committed 3 years ago
  • Merge branch 'master' into olruwase/elastic-ckpt-refresh
    tjruwase committed 3 years ago
  • Merge branch 'master' into olruwase/elastic-ckpt-refresh
    tjruwase committed 3 years ago
  • Merge branch 'master' into olruwase/elastic-ckpt-refresh
    jeffra committed 3 years ago
  • Merge branch 'master' into olruwase/elastic-ckpt-refresh
    tjruwase committed 3 years ago
  • Merge branch 'master' into olruwase/elastic-ckpt-refresh
    tjruwase committed 3 years ago
  • Merge branch 'master' into olruwase/elastic-ckpt-refresh
    tjruwase committed 3 years ago
  • Merge branch 'master' into olruwase/elastic-ckpt-refresh
    tjruwase committed 3 years ago
  • Merge branch 'master' into olruwase/elastic-ckpt-refresh
    tjruwase committed 3 years ago
  • Merge branch 'master' into olruwase/elastic-ckpt-refresh
    tjruwase committed 3 years ago
  • Merge branch 'master' into olruwase/elastic-ckpt-refresh
    tjruwase committed 3 years ago
  • Merge branch 'master' into olruwase/elastic-ckpt-refresh
    tjruwase committed 3 years ago
  • Merge branch 'master' into olruwase/elastic-ckpt-refresh
    tjruwase committed 3 years ago
  • Add torch version guards
    tjruwase committed 3 years ago
  • Merge branch 'master' into olruwase/elastic-ckpt-refresh
    tjruwase committed 3 years ago
  • More precise avoidance of false positives.
    tjruwase committed 3 years ago
  • Merge branch 'olruwase/elastic-ckpt-refresh' of github.com:microsoft/DeepSpeed into olruwase/elastic-ckpt-refresh
    tjruwase committed 3 years ago
  • Merge branch 'master' into olruwase/elastic-ckpt-refresh
    tjruwase committed 3 years ago
  • Merge branch 'master' into olruwase/elastic-ckpt-refresh
    tjruwase committed 3 years ago
  • Merge branch 'master' into olruwase/elastic-ckpt-refresh
    tjruwase committed 3 years ago
  • Merge branch 'master' into olruwase/elastic-ckpt-refresh
    tjruwase committed 3 years ago
  • Merge branch 'master' into olruwase/elastic-ckpt-refresh
    tjruwase committed 3 years ago
Loading