turborepo
46b888b2 - Selectively enable opt-level 1 (#8141)

Commit
1 year ago
Selectively enable opt-level 1 (#8141) This PR compiles all non-workspace dependencies, as well as `turbo-tasks-memory` (which is particularly sensitive) with basic optimizations. Most crates in the workspace still use opt-level 0 locally. While not as good as applying opt-level 1 everywhere, this significantly reduces execution times versus opt-level 0, while making cold builds about 50-60% slower. Warm build times are largely unaffected. The debugging (gdb/lldb) experience may also be slightly worsened by the optimizations. **What about `cargo check`/`cargo clippy`/`rust-analyzer`?** No expected change, as (outside of proc macros) these don't perform LLVM code generation. **Why selectively, and not everywhere?** While applying this everwhere can give us about 3x faster execution, this still gives us *most* of the runtime performance benefits, while avoiding *most* of the compilation cost (especially for warm builds). I believe we should still optimize more for build times than execution times. I benchmarked applying opt-level 1 to all crates here: https://docs.google.com/document/d/1iaREbzYpDmBt54fT2egzptTfx0OYsTIJ633gRqddzDY/edit?usp=sharing **Why not just a few hot dependencies?** I tried profiling the debug build and only optimizing the hot crates, but I wasn't able to get meaningful improvements in my testing. # Benchmarking Notes - System configuration is here: https://github.com/bgw/benchmark-scripts . This is a downclocked machine with most CPU cores disabled to get low-noise measurements. **Treat these results as relative to each other, not as absolute values.** - Build benchmarks are run with `mold`, as GNU `ld` is incredibly slow (and often causes OOMs with 16GB of RAM). We're already using mold in the private nextpack meta-repository. I'll follow up with another PR to use mold or lld by default. # Build Time Benchmarks There's a significant regression to cold builds, but there's no meaningful regression for warm builds. ## Cold time to build tests (2 runs): ``` rm -rf target/ && time RUSTFLAGS=-Clink-arg=-fuse-ld=mold cargo nextest run -- dummy_filter_build_only_dont_run_any_tests ``` Before: ``` real 9m29.839s real 9m27.522s ``` After: ``` real 15m28.105s real 15m28.577s ``` ## Warm time to build tests (2 runs): Modify a string in an error message inside of `crates/turbopack-ecmascript/src/minify.rs`. This guarantees forced recompilation of all dependent crates without meaninfully changing any behavior. Then run: ``` time RUSTFLAGS=-Clink-arg=-fuse-ld=mold cargo nextest run -- dummy_filter_build_only_dont_run_any_tests ``` Before: ``` real 1m33.497s real 1m36.134s ``` After: ``` real 1m41.232s real 1m40.153s ``` ## Warm time to build single binary (2 runs): This is less dependent on linking than the tests, which generate many binary targets. Modify a string in an error message inside of `crates/turbopack-ecmascript/src/minify.rs`. This guarantees forced recompilation of all dependent crates without meaninfully changing any behavior. Then run: ``` time RUSTFLAGS=-Clink-arg=-fuse-ld=mold cargo build -p turbopack-cli ``` Before: ``` real 0m37.565s real 0m37.058s ``` After: ``` real 0m36.450s real 0m36.849s ``` ## Cold time to build a single turborepo binary: ``` rm -rf target/ && time RUSTFLAGS=-Clink-arg=-fuse-ld=mold cargo build -p turbo ``` Before: ``` real 3m43.488s ``` After: ``` real 4m54.416s ``` # Execution Time Benchmarks ## turbopack-cli's `bench_startup` ``` cargo bench --profile dev -p turbopack-cli ``` Before: ``` bench_startup/Turbopack CSR/1000 modules time: [20.744 s 20.869 s 20.995 s] ``` After: ``` bench_startup/Turbopack CSR/1000 modules time: [7.8037 s 7.8505 s 7.9030 s] ``` ## Test Execution (excluding build, 2 runs) With a completely warm build cache (such that nothing needs to build), run: ``` time RUSTFLAGS=-Clink-arg=-fuse-ld=mold cargo nextest run -E 'not test(node_file_trace)' ``` Before: ``` real 2m51.767s real 2m51.482s ``` After: ``` real 1m17.286s real 1m12.520s ```
Author
bgw bgw
Parents
Loading