SemanticDiff

pytorch
f39b6624 - ChunkDataset checkpoint support (#21889)

Commit View On GitHub

Login via GitHub
Home
Pricing
FAQ
Install

Login via GitHub

Commit

5 years ago

ChunkDataset checkpoint support (#21889) Summary: When dealing with large scale dataset, it is handy if we can save the dataset status and resume later. Especially in cases where some unexpected crash happens, user don't need to start over the whole dataset from begining. Instead, they can reload it from the last checkpoint. This change adds support for checkpoint save/load logic in ChunkDataset. On ChunkDataset construction, user can specify a file name from which to load the checkpoint. If it is empty, default to start from fresh; otherwise the ChunkDataset will 'fast forward' the chunk sampler to the corresponding checkpoint. The user can also call ChunkDataset::save() to serialize current status to a file, which can be used later. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21889 Differential Revision: D16024582 Pulled By: ailzhang fbshipit-source-id: 1862ab5116f94c9d29da174ce04a91041d06cad5

Author

xzhu1900

xzhu1900

Committer

facebook-github-bot

facebook-github-bot

Parents

FAQ Terms Privacy Refunds Impressum

Loading