transformers
feat(trainer): Just-in-time (JIT) asynchronous checkpointing using SIGTERM signals
#41723
Merged

feat(trainer): Just-in-time (JIT) asynchronous checkpointing using SIGTERM signals #41723

efazal
efazal
Rocketknight1
efazal
efazal
sfc-gh-sbekman
efazal
stas00
efazal
stas00
efazal efazal force pushed from a457b744 to 05411f37 69 days ago
efazal
stas00
stas00 commented on 2025-10-22
stas00
efazal
efazal
efazal
efazal efazal force pushed from e093fcd7 to 5b21304d 63 days ago
efazal
SunMarc SunMarc requested a review from ArthurZucker ArthurZucker 56 days ago
SunMarc SunMarc requested a review from SunMarc SunMarc 56 days ago
SunMarc
efazal
SunMarc SunMarc requested a review from qgallouedec qgallouedec 54 days ago
SunMarc
efazal
efazal efazal force pushed from 5b21304d to 16b0a402 43 days ago
SunMarc
SunMarc commented on 2025-11-19
efazal
efazal efazal force pushed from 16b0a402 to 00a9c2b6 30 days ago
efazal Just-in-time (JIT) asynchronous checkpointing using SIGTERM signals a…
b95a867b
efazal Fix failing ci tests
a1389622
efazal Update JIT checkpoint code to remove CUDA streams and async checkpoin…
44433fb0
efazal Fix sentinel file save path to checkpoint folder, update checkpoint r…
4ab2427d
efazal Refactor JIT checkpoint logic: rename methods and variables for clari…
929d2dcc
efazal Remove support for environment variable overrides in `TrainingArgumen…
1eb7f1f2
efazal efazal force pushed from 434c4b58 to 1eb7f1f2 29 days ago
SunMarc Merge branch 'main' into feat-jit-checkpointing
6e6f4c26
SunMarc
github-actions
github-actions[bot] Apply style fixes
57a9df16
SunMarc
SunMarc approved these changes on 2025-12-03
HuggingFaceDocBuilderDev
efazal
SunMarc
SunMarc SunMarc merged fda2d735 into main 26 days ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone