uv
f9528b4e - pep440-rs: switch Version::release to smallvec

Commit

2 years ago

pep440-rs: switch Version::release to smallvec This commit attempts an optimization that switches a version's `release` field over to a `smallvec` optimization. The idea is that most versions are very small and can be stored inline. Interestingly, I was unable to observe any obvious benefit: $ hyperfine \ "./target/profiling/puffin-dev-u32 resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null" \ "./target/profiling/puffin-dev-smallvec-release resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null" Benchmark 1: ./target/profiling/puffin-dev-u32 resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null Time (mean ± σ): 872.2 ms ± 26.5 ms [User: 14646.0 ms, System: 2516.0 ms] Range (min … max): 833.0 ms … 912.0 ms 10 runs Benchmark 2: ./target/profiling/puffin-dev-smallvec-release resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null Time (mean ± σ): 882.3 ms ± 17.4 ms [User: 14764.4 ms, System: 2520.9 ms] Range (min … max): 859.7 ms … 912.7 ms 10 runs Summary './target/profiling/puffin-dev-u32 resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null' ran 1.01 ± 0.04 times faster than './target/profiling/puffin-dev-smallvec-release resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null' My hypothesis is that because of an earlier commit that switched the global allocator to jemalloc, the cost of allocation had precipitously decreased. To the point that the reduction in allocs from the smallvec becomes a wash. To test my hypothesis, I dropped the jemalloc commit and measured the perf of the smallvec optimization against main: $ hyperfine \ "./target/profiling/puffin-dev-main resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null" \ "./target/profiling/puffin-dev-smallvec-release-no-jemalloc resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null" Benchmark 1: ./target/profiling/puffin-dev-main resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null Time (mean ± σ): 968.0 ms ± 20.0 ms [User: 17637.4 ms, System: 2151.9 ms] Range (min … max): 940.2 ms … 1005.3 ms 10 runs Benchmark 2: ./target/profiling/puffin-dev-smallvec-release-no-jemalloc resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null Time (mean ± σ): 958.4 ms ± 15.7 ms [User: 17119.7 ms, System: 2246.1 ms] Range (min … max): 944.7 ms … 993.3 ms 10 runs Summary './target/profiling/puffin-dev-smallvec-release-no-jemalloc resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null' ran 1.01 ± 0.03 times faster than './target/profiling/puffin-dev-main resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null' Fiddlesticks. Even when allocation is (presumably) more expensive, the smallvec optimization didn't help. This suggests something is off about my mental model of the code. So there are more avenues to explore here!

References

ag/change-global-allocator-and-smallvec

Author

BurntSushi

Committer

BurntSushi

Parents

259a835a

uv f9528b4e - pep440-rs: switch Version::release to smallvec

uv
f9528b4e - pep440-rs: switch Version::release to smallvec