show_pickle/model_dump: Handle invalid UTF-8 in pickles (#57661)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57661
Thie Pickle "specification" (pickletools.py) states that the argument to
a BINUNICODE opcode must be UTF-8 encoded. However, if a PyTorch custom
class returns a non-UTF-8 std::string from its pickle method the
libtorch Pickler will write it to the output pickle without complaining.
Python's _Unpickler (the Python implementation of Unpickler) always
throws an exception when trying to deserialize these invalid pickles.
We still want to be able to dump these pickle files. Update
DumpUnpickler to create its own opcode dispatch table (initialized as a
clone of the _Unpickler dispatch table) and patch in a custom function
for the BINUNICODE op. We try to emulate the default behavior, but any
UnicodeDecodeError is caught and replaced with a dummy object. This
could violate the assumptions of a user that expects a str in that
position, so we disable this behavior by default.
Update model_dump to recognize this special object and allow it to be
rendered.
Test Plan: Dumped and viewed a model with an invalid string in an object state.
Reviewed By: malfet
Differential Revision: D28531392
Pulled By: dreiss
fbshipit-source-id: ab5aea20975a0ef53ef52a880deaa2c5a626e4a2