Buffer to speed Unpickler (#27727)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27727
This change uses a small buffer in the Unpickler to avoid
calling reader_() byte-by-byte. Particularly, the unpickler has a
tight loop reading 1-byte opcodes.
This can be more efficient because we avoid the variable-sized
memcpy (due to templating) and std::function indirection for the
common fast path.
This improves the unpickle-1m-ints benchmark by ~20%.
This change requires changing the std::function<> interface
to Unpickler to return size_t rather than bool, but there are
only a few uses of this api.
Test Plan:
buck test caffe2/test/...
benchmark in experimental/jeremyl/c2/SerializationBench
Differential Revision: D17869980
fbshipit-source-id: 37e752744d19e12b7282252c8963355970bd4feb