DeepSpeed
6d9c3dc0 - Fix Nebula checkpoint engine commit() API mismatch (#7740)

Commit
80 days ago
Fix Nebula checkpoint engine commit() API mismatch (#7740) ## Summary - Fix `AttributeError: 'str' object has no attribute 'tag'` when using Nebula checkpoint engine - Pass `CheckpointCommitInfo` object instead of raw `tag` string to `checkpoint_engine.commit()` ## Description The `CheckpointEngine.commit()` interface expects a `CheckpointCommitInfo` object, but two call sites in `engine.py` were passing a raw `tag` string instead: 1. `save_checkpoint()` at line 3695 2. `save_16bit_model()` at line 4230 This worked with `TorchCheckpointEngine` because it ignores the parameter, but `NebulaCheckpointEngine` accesses `info.tag`, causing the crash. ## Changes - **Line 3695**: Create `CheckpointCommitInfo` object before calling `commit()` - **Line 4231**: Use existing `commit_info` variable instead of `tag` ## Test plan - [ ] Verify Nebula checkpoint engine saves work without `AttributeError` - [ ] Verify TorchCheckpointEngine still works (no regression) - [ ] Run existing checkpoint-related unit tests Fixes #7678 Signed-off-by: Rakshit-gen <sisodiarakshit456@gmail.com> Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>
Author
Parents
Loading