pytorch
0e2520ba - [PyTorch] Don't read 1 char per iteration in Unpickler::readString (#51901)

Commit
3 years ago
[PyTorch] Don't read 1 char per iteration in Unpickler::readString (#51901) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51901 It's much more efficient to read multiple chars with 1 memcpy than to call `read<char>` multiple times. ghstack-source-id: 121278774 Test Plan: Run WireSerializerBench before/after for small tensors: ``` /tmp/WireSerializerBench.Reader --real_data /mnt/homedir/hwwang/test_serialized_api_request --real_pytorch_api_request --bm_regex '[Ss]mall' ``` Before: ``` DeSerializeWire(Small) 7.65us 130.65K DeSerializeWire(small_Zstd) 100.49% 7.62us 131.29K DeSerializeWire(small_Snappy) 100.49% 7.62us 131.29K DeSerializeWireIValue(Small) 82.89% 9.23us 108.30K DeSerializeWireIValue(small_Zstd) 82.87% 9.24us 108.27K DeSerializeWireIValue(small_Snappy) 82.33% 9.30us 107.57K DeSerializeC2ToBlob(small_NoCompress) 1150.28% 665.39ns 1.50M DeSerializeC2ToBlob(small_Zstd) 1149.70% 665.72ns 1.50M DeSerializeC2ToBlob(small_Zstd_Fast) 1150.94% 665.00ns 1.50M DeSerializeC2ToBlob(Small_Snappy) 1151.70% 664.57ns 1.50M DeSerializeC2ToString(small) 9297.81% 82.32ns 12.15M ``` After: ``` DeSerializeWire(Small) 6.86us 145.84K DeSerializeWire(small_Zstd) 100.52% 6.82us 146.60K DeSerializeWire(small_Snappy) 100.13% 6.85us 146.03K DeSerializeWireIValue(Small) 83.94% 8.17us 122.42K DeSerializeWireIValue(small_Zstd) 84.00% 8.16us 122.50K DeSerializeWireIValue(small_Snappy) 84.53% 8.11us 123.28K DeSerializeC2ToBlob(small_NoCompress) 1019.48% 672.58ns 1.49M DeSerializeC2ToBlob(small_Zstd) 1020.03% 672.23ns 1.49M DeSerializeC2ToBlob(small_Zstd_Fast) 1020.59% 671.85ns 1.49M DeSerializeC2ToBlob(Small_Snappy) 1020.30% 672.05ns 1.49M DeSerializeC2ToString(small) 7709.63% 88.94ns 11.24M ``` Second run after to demonstrate it wasn't just variance: ``` DeSerializeWire(Small) 6.92us 144.57K DeSerializeWire(small_Zstd) 99.24% 6.97us 143.47K DeSerializeWire(small_Snappy) 99.58% 6.95us 143.97K DeSerializeWireIValue(Small) 84.83% 8.15us 122.63K DeSerializeWireIValue(small_Zstd) 84.72% 8.16us 122.49K DeSerializeWireIValue(small_Snappy) 84.59% 8.18us 122.29K DeSerializeC2ToBlob(small_NoCompress) 1031.03% 670.89ns 1.49M DeSerializeC2ToBlob(small_Zstd) 1030.64% 671.14ns 1.49M DeSerializeC2ToBlob(small_Zstd_Fast) 1013.39% 682.57ns 1.47M DeSerializeC2ToBlob(Small_Snappy) 1013.95% 682.19ns 1.47M DeSerializeC2ToString(small) 8155.98% 84.81ns 11.79M ``` By the way, this gets us closer to deserialization parity for the real data sample included in D26049387: baseline: ``` DeSerializeWire(RealData) 7.34ms 136.24 DeSerializeWire(RealData_Zstd) 99.95% 7.34ms 136.17 DeSerializeWire(RealData_Snappy) 100.09% 7.33ms 136.36 DeSerializeWireIValue(RealData) 82.69% 8.88ms 112.65 DeSerializeWireIValue(RealData_Zstd) 82.76% 8.87ms 112.75 DeSerializeWireIValue(RealData_Snappy) 82.68% 8.88ms 112.64 DeSerializeC2ToBlob(RealData_NoCompress) 116.87% 6.28ms 159.23 DeSerializeC2ToBlob(RealData_Zstd) 117.33% 6.26ms 159.85 DeSerializeC2ToBlob(RealData_Zstd_Fast) 117.38% 6.25ms 159.91 DeSerializeC2ToBlob(RealData_Snappy) 117.61% 6.24ms 160.23 DeSerializeC2ToString(RealData) 4571.81% 160.55us 6.23K ``` with this diff: ``` DeSerializeWire(RealData) 6.57ms 152.17 DeSerializeWire(RealData_Zstd) 100.17% 6.56ms 152.43 DeSerializeWire(RealData_Snappy) 100.09% 6.57ms 152.31 DeSerializeWireIValue(RealData) 83.06% 7.91ms 126.40 DeSerializeWireIValue(RealData_Zstd) 83.16% 7.90ms 126.54 DeSerializeWireIValue(RealData_Snappy) 83.22% 7.90ms 126.64 DeSerializeC2ToBlob(RealData_NoCompress) 104.02% 6.32ms 158.29 DeSerializeC2ToBlob(RealData_Zstd) 103.46% 6.35ms 157.43 DeSerializeC2ToBlob(RealData_Zstd_Fast) 104.64% 6.28ms 159.23 DeSerializeC2ToBlob(RealData_Snappy) 104.65% 6.28ms 159.25 DeSerializeC2ToString(RealData) 4051.03% 162.22us 6.16K ``` Reviewed By: qizzzh Differential Revision: D26321083 fbshipit-source-id: 92d45e760580bb290078ddac84128174daef0e55
Author
Parents
Loading