DeepSpeed
Add DataStates-LLM: Asynchronous Checkpointing Engine Support
#7166
Merged

Add DataStates-LLM: Asynchronous Checkpointing Engine Support #7166

sfc-gh-truwase merged 10 commits into deepspeedai:master from DataStates:dev
mauryaavinash95
mauryaavinash95 mauryaavinash95 requested a review from tjruwase tjruwase 297 days ago
mauryaavinash95 mauryaavinash95 requested a review from tohtana tohtana 297 days ago
mauryaavinash95 mauryaavinash95 requested a review from jomayeri jomayeri 297 days ago
mauryaavinash95 mauryaavinash95 requested a review from loadams loadams 297 days ago
mauryaavinash95 mauryaavinash95 requested a review from GuanhuaWang GuanhuaWang 297 days ago
mauryaavinash95 mauryaavinash95 requested a review from hwchen2017 hwchen2017 297 days ago
mauryaavinash95 mauryaavinash95 changed the title Add DataStates-LLM: Asynchronous Checkpointing Engine Support #5763 Add DataStates-LLM: Asynchronous Checkpointing Engine Support 297 days ago
loadams
mauryaavinash95
loadams
mauryaavinash95
mauryaavinash95
tjruwase
tjruwase commented on 2025-03-25
tjruwase
tjruwase
tjruwase commented on 2025-03-25
mauryaavinash95
tjruwase
mauryaavinash95 mauryaavinash95 force pushed from 968f6ca5 to 1c701d7c 290 days ago
mauryaavinash95
mauryaavinash95 mauryaavinash95 requested a review from tjruwase tjruwase 290 days ago
tjruwase
tjruwase commented on 2025-03-29
mauryaavinash95
tjruwase
mauryaavinash95 mauryaavinash95 requested a review from tjruwase tjruwase 286 days ago
tjruwase
tjruwase commented on 2025-04-02
tjruwase
tjruwase commented on 2025-04-02
tjruwase
tjruwase commented on 2025-04-02
tjruwase
tjruwase commented on 2025-04-02
mauryaavinash95
tjruwase
tjruwase approved these changes on 2025-04-02
tjruwase
mauryaavinash95
mauryaavinash95 mauryaavinash95 force pushed from 11dd8437 to 3a820715 272 days ago
mauryaavinash95 mauryaavinash95 force pushed from 3a820715 to 84f067b6 272 days ago
mauryaavinash95 mauryaavinash95 force pushed from 84f067b6 to 6160140e 272 days ago
mauryaavinash95 mauryaavinash95 force pushed from 6160140e to 4651ec29 272 days ago
mauryaavinash95 mauryaavinash95 requested a review from tjruwase tjruwase 272 days ago
loadams
mauryaavinash95
sfc-gh-truwase
mauryaavinash95 mauryaavinash95 closed this 101 days ago
mauryaavinash95 mauryaavinash95 force pushed from e2cf199c to 7d9a2f2b 101 days ago
mauryaavinash95 mauryaavinash95 reopened this 101 days ago
mauryaavinash95
mauryaavinash95 mauryaavinash95 force pushed from aace707a to c196e62a 101 days ago
mauryaavinash95 mauryaavinash95 closed this 101 days ago
mauryaavinash95 mauryaavinash95 force pushed from c196e62a to 7d9a2f2b 101 days ago
mauryaavinash95 mauryaavinash95 reopened this 101 days ago
Update datastates using decoupled checkpointing APIs (fix pre-commit)
0270a7b9
mauryaavinash95 mauryaavinash95 force pushed from acc4dc31 to 0270a7b9 101 days ago
mauryaavinash95
sfc-gh-truwase
sfc-gh-truwase
sfc-gh-truwase commented on 2025-10-11
sfc-gh-truwase
sfc-gh-truwase commented on 2025-10-15
sfc-gh-truwase
sfc-gh-truwase approved these changes on 2025-10-15
sfc-gh-truwase
Add persistence when committing checkpoints
8db3a488
mauryaavinash95 Import datastates checkpoint engine locally
3a84c230
Export DataStates engine from runtime/checkpoint_engine
9e261e1b
mauryaavinash95 mauryaavinash95 force pushed from bd9ef2b6 to 9e261e1b 89 days ago
DataStates set commit_info=None after committing
0a6ff214
mauryaavinash95
sfc-gh-truwase
mauryaavinash95 Merge branch 'master' into dev
55dfe7e0
mauryaavinash95
mauryaavinash95 Merge branch 'master' into dev
0ea44730
Fix datastates ImportError
867523fc
sfc-gh-truwase sfc-gh-truwase enabled auto-merge (squash) 84 days ago
mauryaavinash95 Merge branch 'master' into dev
4188eba3
mauryaavinash95 Merge branch 'master' into dev
d38bb0e8
sfc-gh-truwase sfc-gh-truwase merged d1e62ff2 into master 81 days ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone