pytorch
68df4d40 - show_pickle/model_dump: Handle invalid UTF-8 in pickles (#57661)

Commit
4 years ago
show_pickle/model_dump: Handle invalid UTF-8 in pickles (#57661) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57661 Thie Pickle "specification" (pickletools.py) states that the argument to a BINUNICODE opcode must be UTF-8 encoded. However, if a PyTorch custom class returns a non-UTF-8 std::string from its pickle method the libtorch Pickler will write it to the output pickle without complaining. Python's _Unpickler (the Python implementation of Unpickler) always throws an exception when trying to deserialize these invalid pickles. We still want to be able to dump these pickle files. Update DumpUnpickler to create its own opcode dispatch table (initialized as a clone of the _Unpickler dispatch table) and patch in a custom function for the BINUNICODE op. We try to emulate the default behavior, but any UnicodeDecodeError is caught and replaced with a dummy object. This could violate the assumptions of a user that expects a str in that position, so we disable this behavior by default. Update model_dump to recognize this special object and allow it to be rendered. Test Plan: Dumped and viewed a model with an invalid string in an object state. Reviewed By: malfet Differential Revision: D28531392 Pulled By: dreiss fbshipit-source-id: ab5aea20975a0ef53ef52a880deaa2c5a626e4a2
Author
Parents
Loading