next.js
5d3bc2b5 - perf(turbo-tasks): Call `.shrink_to_fit()` on common collection types when constructing a cell (#72113)

Commit

1 year ago

perf(turbo-tasks): Call `.shrink_to_fit()` on common collection types when constructing a cell (#72113) Cell contents are immutable once constructed, so there's no chance that they'll grow in size again. Common collections can be shrunk to avoid storing empty spare capacity in this case (note: if they're already correctly sized, `shrink_to_fit` bails out early). **Result:** This gives a ~1.4% decrease in top-line peak memory consumption, for a theoretical CPU/Wall time cost that's too small to measure. **Inspiration:** The inspiration for this was vercel/turborepo#2873, which decreased task storage (not the top-line memory usage?) by ~14%, vercel/turborepo#8657, and other similar optimization PRs. ## Additional Opportunities - There may be more places where cell are constructed (e.g. persistent storage deserialization) where a cell's `SharedReference` is constructed that is not currently captured by this. - Depending on the library used, deserialization may already construct exact-sized collections. - As an additional example, manually constructing a `ReadRef` and converting it into a cell skips this optimization because `ReadRef::cell` internally uses the type-erased shared-reference `raw_cell` API which is incompatible with this optimization. We could special-case that in the `ReadRef::new_owned` constructor (not in `ReadRef::new_arc` though), but nobody should be manually constructing `ReadRef`s. - We still don't use `shrink_to_fit` on `RcStr` types. Some of these are in-place extended (when they have a refcount of 1) with `RcStr::map`, so we probably don't want to be too aggressive about this to avoid `O(n^2)` time complexity blowups. ## Memory Benchmark Setup ```bash cd ~/next.js cargo run -p next-build-test --release -- generate ~/shadcn-ui/apps/www/ > ~/shadcn-ui/apps/www/project_options.json pnpm pack-next --project ~/shadcn-ui/apps/www/ ``` ```bash cd ~/shadcn-ui pnpm i cd ~/shadcn-ui/apps/www/ heaptrack ~/next.js/target/release/next-build-test run sequential 1 1 '/sink' heaptrack --analyze ~/shadcn-ui/apps/www/heaptrack.next-build-test.3604648.zst ``` ### Memory Before (canary branch) First Run: ``` peak heap memory consumption: 3.23G peak RSS (including heaptrack overhead): 4.75G ``` Second Run: ``` peak heap memory consumption: 3.23G peak RSS (including heaptrack overhead): 4.75G ``` ### Memory After (this PR) First Run: ``` peak heap memory consumption: 3.18G peak RSS (including heaptrack overhead): 4.74G ``` Second Run: ``` peak heap memory consumption: 3.19G peak RSS (including heaptrack overhead): 4.73G ``` This is about a 1.4% decrease in top-line memory consumption. ## Wall Time with `hyperfine` (Counter-Metric) This is theoretically a time-memory tradeoff, as we'll spend some time `memcpy`ing things into smaller allocations, though in some cases reducing memory usage can improve cache locality, so it's not always obvious. ``` hyperfine --warmup 3 -r 30 --time-unit millisecond '~/next.js/target/release/next-build-test run sequential 1 1 /sink' ``` This benchmark is slow and takes about 30 minutes to run. Before: ``` Benchmark 1: ~/next.js/target/release/next-build-test run sequential 1 1 /sink Time (mean ± σ): 56387.5 ms ± 212.6 ms [User: 107807.5 ms, System: 9509.8 ms] Range (min … max): 55934.4 ms … 56872.9 ms 30 runs ``` After: ``` Benchmark 1: ~/next.js/target/release/next-build-test run sequential 1 1 /sink Time (mean ± σ): 56020.9 ms ± 235.4 ms [User: 107483.8 ms, System: 9371.8 ms] Range (min … max): 55478.2 ms … 56563.6 ms 30 runs ``` This is a ~0.65% *reduction* in wall time. This is small enough (<2 standard deviations) to likely just be noise. ## Wall Time with `turbopack-bench` (Counter-Metric) ``` cargo bench -p turbopack-bench -p turbopack-cli -- "hmr_to_eval/Turbopack CSR" ``` Gives: ``` bench_hmr_to_eval/Turbopack CSR/1000 modules time: [15.123 ms 15.208 ms 15.343 ms] change: [-0.8471% +0.4882% +1.9719%] (p = 0.55 > 0.05) No change in performance detected. ``` Using https://github.com/bgw/benchmark-scripts/ In practice, it's not really possible to measure changes in wall time <1%, so this is within "noise" territory (as noted in the criterion output). Closes PACK-3361

References

#72113 - perf(turbo-tasks): Call `.shrink_to_fit()` on common collection types when constructing a cell

Author

bgw

Parents

a44a0d91

next.js 5d3bc2b5 - perf(turbo-tasks): Call `.shrink_to_fit()` on common collection types when constructing a cell (#72113)

next.js
5d3bc2b5 - perf(turbo-tasks): Call `.shrink_to_fit()` on common collection types when constructing a cell (#72113)