pytorch
14923741 - Using deterministic hashing instead of GUID for pytorch serialization id generation (#101964)

Commit View On GitHub

Commit

1 year ago

Using deterministic hashing instead of GUID for pytorch serialization id generation (#101964) Summary: serialization_id was added in a previous change to be written as a random GUID associated with each time saving of a module is called, for the purpose of adding tracking for saved artifacts. In order not to disturb existing systems that rely on the serialized bytes to be deterministic for serializing the same module, this change uses the combined hash of uncompressed content and file names instead of GUID for serialization id. The use of this hashing reuses the same CRC32 that is already calculated for zip writing, so it doesn't incur additional computational overhead. Data descriptor is one of the file headers inside the zip format https://en.wikipedia.org/wiki/ZIP_(file_format)#Data_descriptor. It contains the CRC32 of the uncompressed data. By inspecting the written data in PyTorchStreamWriter, the CRC32 is found for each written record. In order to make serialization_id a unique and deterministic id for the serialized files without computation overhead, the updated `serialization_id` is computed based on all files written, and is composed of: 1) a combined hash of record name hashes 2) a combined crc32 of the record uncompressed data Example value: "15656915541136177431866432772" Test Plan: buck2 test @//mode/dev //caffe2/caffe2/serialize:inline_container_test Differential Revision: D46038973 Pull Request resolved: https://github.com/pytorch/pytorch/pull/101964 Approved by: https://github.com/davidberard98

References

gh/soulitzer/208/base

gh/soulitzer/209/base

Author

atannous

Committer

pytorchmergebot

Parents

76af2210

pytorch 14923741 - Using deterministic hashing instead of GUID for pytorch serialization id generation (#101964)

Commit

pytorch
14923741 - Using deterministic hashing instead of GUID for pytorch serialization id generation (#101964)