next.js
05553796 - turbo-tasks: task-storage memory wins (#93720)

Commit
12 days ago
turbo-tasks: task-storage memory wins (#93720) ## Summary Four small, independent changes that shrink `TaskStorage` and the data it owns: Recommend reviewing commit-by-commit 1. **`Arc<CachedTaskType>` → `triomphe::Arc<CachedTaskType>`.** `triomphe::Arc` is already a workspace dep used in `ReadRef` / `SharedReference`. `CachedTaskType` never appears in a `Weak<...>`, so we can drop the weak count and the CAS in `drop_slow`. Saves one `usize` per allocation. Migrated via a `CachedTaskTypeArc` newtype so the bincode `Encode`/`Decode` impls don't need to cross the orphan rule. 2. **Niche-encode `CellDependency`.** The `cell_dependencies` / `cell_dependents` sets used to hold `(CellRef, Option<u64>)` tuples — `Option<u64>` cost a full 16 B (8 B discriminant + 8 B value, aligned), making each element 32 B. A `CellDependency` enum with two variants (`All(CellRef)` / `Hash(CellRef, u64)`) lets the layout algorithm reuse the niche on `ValueTypeId` (`NonZero<u16>`) inside `CellRef.cell.type_id` for the variant tag. Element size drops 32 → 24 B; `LazyField` from 56 → 48 B. The same enum backs both forward and reverse edges — for `cell_dependents` we re-point `CellRef.task` at the dependent task. Added `CellDependency::into_parts()` and use it in `iter_cell_dependents` / `iter_cell_dependencies` hot loops so the discriminant is checked once instead of twice via back-to-back `cell_ref()` + `key()` calls. 3. **`TaskStorage::lazy: Vec<LazyField>` → `TinyVec<LazyField>`.** The lazy vec only ever holds ~25 elements (one per declared lazy field in the schema). Swapping `Vec`'s 24 B `(ptr, len, cap)` header for `(ptr, len: u8, cap: u8)` + 6 B padding gives 16 B. Drops `size_of::<TaskStorage>()` from 136 → 128 B. `TinyVec` is hand-rolled so I added a push/iter micro-benchmark to confirm it doesn't lose performance vs std `Vec`. Results below. 4. **Rightsize collections** → Explore the `AutoSet`/`AutoMap` types in storage_schema and ensure each one is maximally sized for its natural alignment. ## Benchmark results ### `next build` on a representative app (15 runs each, M4 Pro, `caffeinate -dimsu nice -n -20`) Fresh same-day baseline against branch: | metric | canary | branch | Δ | 95% CI | significant? | |---|---:|---:|---:|---|:---:| | wall time | 40.83s | 41.12s | +0.7% | [−1.07s, +1.64s] | no | | user time | 282.27s | 283.21s | +0.3% | [−1.02s, +2.89s] | no | | sys time | 69.38s | 71.26s | +2.7% | [−1.54s, +5.32s] | no | | **MaxRSS** | **12.47 GB** | **12.04 GB** | **−3.4%** | **[−0.48 GB, −0.38 GB]** | **yes** | **MaxRSS is the headline.** −0.43 GB on a 12.5 GB working set, with t=−17.86 (every branch run lower than every canary run, CV ≤ 0.6% on both sides). Wall / user / sys are all within noise — this PR is a memory win with no measurable timing impact. ### `TinyVec` vs `Vec` micro-bench (`turbo-tasks/benches/tiny_vec.rs`, 200 samples each) | n | Vec push | TinyVec push | Δ% | Vec iter | TinyVec iter | Δ% | |---:|---:|---:|---:|---:|---:|---:| | 0 | 1.31ns | 894ps | **−31.8%** | 598ps | 596ps | −0.4% | | 1 | 16.92ns | 14.75ns | **−12.9%** | 964ps | 952ps | −1.2% | | 4 | 17.93ns | 15.93ns | **−11.1%** | 1.49ns | 1.50ns | +0.5% | | 8 | 63.13ns | 45.24ns | **−28.3%** | 1.97ns | 1.96ns | −0.2% | | 16 | 97.35ns | 79.91ns | **−17.9%** | 3.16ns | 3.14ns | −0.5% | | 24 | 137.41ns | 119.88ns | **−12.8%** | 4.30ns | 4.30ns | +0.0% | TinyVec push is 11–32% faster than Vec push across all realistic sizes; iter is identical. Run with `cargo bench -p turbo-tasks --bench tiny_vec`. ### `task_overhead/turbo` Criterion bench (M4 Pro, `--sample-size 200`) | variant | dur | canary | branch | Δ | significant? | |---|---:|---:|---:|---:|:---:| | turbo-uncached | 1µs | 9.77 µs | 9.68 µs | −1.0% | yes | | turbo-uncached | 1000µs | 1.01 ms | 1.01 ms | −0.1% | yes | | turbo-cached-same-keys | 1µs | 198.6 ns | 191.9 ns | −3.4% | yes | | turbo-cached-same-keys | 100µs | 226.5 ns | 208.1 ns | −8.1% | yes | | turbo-cached-different-keys | 1µs | 233.8 ns | 224.1 ns | −4.2% | yes | | turbo-cached-different-keys | 100µs | 305.3 ns | 246.9 ns | −19.1% | yes | | turbo-uncached-parallel | 10µs | 1.63 µs | 1.54 µs | −5.8% | yes | | turbo-uncached-parallel | 100µs | 8.41 µs | 7.88 µs | −6.3% | yes | <!-- NEXT_JS_LLM_PR -->
Author
Parents
Loading