String opts related to deserialization. (#28263)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28263
When looking at profiles of deserializing small data from torch::load(),
we found some straightforward string-related changes that in aggregate
improve the base time by 25%.
One of the main problems was over-use of std::stringstream - the
constructors alone were 18%+ of the time spent. This change improves
unpickling/deserializing by converting a handful of the hottest
usecases from the profiles:
- unpickler's readString() goes from 10.3% of time to mostly out of the picture
- QualifiedHame constructor (particularly Join call) was 8.9% of time,
but afterwards disappears from the profiles.
- getRecordID/hasRecord were ~5% each, but also get somewhat smaller.
ghstack-source-id: 92158727
Test Plan:
Benchmark in buck build mode/opt experimental/jeremyl/c2:SerializationBench
Correctness in buck test mode/dev-nosan caffe2/test/...
Differential Revision: D17997056
fbshipit-source-id: fc6d6c7da7557ff23c8e8c7dbe4c060abf860018