Buffer in Pickler to improve performance. (#27720)
Summary:
This change adds a small fixed-size buffer to Pickler to
avoid calling writer_() and the associated downstream checks
on a per-opcode/per-byte basis.
We end up still doing a bounds check in the common case,
but the memcpy() is a fixed size. And we reduce the number
of backend calls.
In practice, this change speeds up the Pickle1MInts benchmark
for me locally from roughly 56msec to 22msec.
Additionally, in this change we convert a few pushIValue() on
typed lists, where we know the type to be double/int/boot to be
pushInt() to bypass a bit of logic.
We should additionally change the Unpickler, though keeping
this separate, since the std::function<> prototype needs to be
changed for this to work (i.e. return size_t rather than bool).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27720
Test Plan:
buck test mode/dev-nosan caffe2/test:...
Benchmark in experimental/jeremyl/c2/SerializationBench.cpp (run in mode/opt)
Differential Revision: D17847174
Pulled By: jjlilley
fbshipit-source-id: 22e5e5fd33f1a369c124ea5aac7880538e2bf6a0