next.js
226594f9 - refactor(turbo-tasks): Make CURRENT_TASK_STATE shareable across multiple tokio tasks / threads, wait for detached work to finish before exiting a turbo task (#68921)

Commit

1 year ago

refactor(turbo-tasks): Make CURRENT_TASK_STATE shareable across multiple tokio tasks / threads, wait for detached work to finish before exiting a turbo task (#68921) The current plan is that tasks using `local_cells` will still be spawned on separate threads using `tokio::spawn`. This means two things: - `CURRENT_TASK_STATE` must use an `Arc<Mutex<...>>` so that multiple tasks across multiple threads can modify the same cell counters, local cells, etc. - We can't `.take()` the data structures in `CURRENT_TASK_STATE` and perform cleanup of the parent task until all the child tasks using `local_cells` are complete. Waiting for cleanup of child tasks is performed using [`tokio_util::task::TaskTracker`](https://docs.rs/tokio-util/latest/tokio_util/task/task_tracker/struct.TaskTracker.html), which seems like a reasonably clean and lightweight way of doing this (it uses some refcounts and a `tokio::sync::Notify` instance). Because `TurboTasks::detached_for_testing` suffered similar problems to what `local_cells` will face in regards to task cleanup, I'm using that API as a way of testing some of these changes. ## Testing Added test coverage for `TurboTasks::detached_for_testing` that asserts that the parent task waits for the detached task to complete before exiting. Confirmed that the test fails if I comment out ``` ltt.wait().await ``` in the `TurboTasks::schedule` method. Ran the test with: ``` cargo nextest r -p turbo-tasks-memory -- test_spawns_detached ``` ## Benchmarking *The switch from a `RefCell` to a `RwLock` will inevitably make this code run slower, though hopefully `RwLock` should still be fast enough in the case where there's no lock contention. There would only be lock contention if you're actually using detached futures or `local_cells`.* On a [low noise machine](https://github.com/bgw/benchmark-scripts): ``` cargo bench -p turbopack-bench -p turbopack-cli -- "hmr_to_eval/Turbopack CSR" ``` Manually modified the test to use more time and samples: ```diff diff --git a/turbopack/crates/turbopack-bench/src/lib.rs b/turbopack/crates/turbopack-bench/src/lib.rs index 4e3df12db0..e2f16fb0d6 100644 --- a/turbopack/crates/turbopack-bench/src/lib.rs +++ b/turbopack/crates/turbopack-bench/src/lib.rs @@ -127,8 +127,8 @@ enum CodeLocation { pub fn bench_hmr_to_eval(c: &mut Criterion, bundlers: &[Box<dyn Bundler>]) { let mut g = c.benchmark_group("bench_hmr_to_eval"); - g.sample_size(10); - g.measurement_time(Duration::from_secs(60)); + g.sample_size(50); + g.measurement_time(Duration::from_secs(300)); bench_hmr_internal(g, CodeLocation::Evaluation, bundlers); } ``` **Before:** ``` bench_hmr_to_eval/Turbopack CSR/1000 modules time: [14.714 ms 14.776 ms 14.854 ms] Found 2 outliers among 50 measurements (4.00%) 1 (2.00%) high mild 1 (2.00%) high severe ``` **After:** ``` bench_hmr_to_eval/Turbopack CSR/1000 modules time: [14.773 ms 14.846 ms 14.937 ms] change: [-0.9937% +0.4037% +1.8024%] (p = 0.58 > 0.05) No change in performance detected. Found 1 outliers among 50 measurements (2.00%) 1 (2.00%) high severe ``` ## Memory Benchmarking ``` cd ~/next.js cargo build -p next-build-test --release cargo run -p next-build-test --release -- generate ~/shadcn-ui/apps/www/ > ~/shadcn-ui/apps/www/project_options.json cd ~/shadcn-ui/apps/www heaptrack ~/next.js/target/release/next-build-test run sequential 1 1 '/sink' heaptrack --analyze /home/bgw.linux/shadcn-ui/apps/www/heaptrack.next-build-test.1138984.zst | tail -50 ``` Before *(Run 1)*: ``` total runtime: 791.00s. calls to allocation functions: 245495035 (310358/s) temporary memory allocations: 19923850 (25188/s) peak heap memory consumption: 7.61G peak RSS (including heaptrack overhead): 9.97G total memory leaked: 12.53M suppressed leaks: 7.24K ``` Before *(Run 2)*: ``` total runtime: 635.10s. calls to allocation functions: 245867464 (387133/s) temporary memory allocations: 19575558 (30822/s) peak heap memory consumption: 7.62G peak RSS (including heaptrack overhead): 9.96G total memory leaked: 13.16M suppressed leaks: 7.24K ``` After *(Run 1)*: ``` total runtime: 792.50s. calls to allocation functions: 253642374 (320053/s) temporary memory allocations: 19250501 (24290/s) peak heap memory consumption: 7.62G peak RSS (including heaptrack overhead): 9.96G total memory leaked: 12.98M suppressed leaks: 7.24K ``` After *(Run 2)*: ``` total runtime: 691.16s. calls to allocation functions: 252890162 (365891/s) temporary memory allocations: 19143529 (27697/s) peak heap memory consumption: 7.61G peak RSS (including heaptrack overhead): 9.98G total memory leaked: 13.42M suppressed leaks: 7.24K ```

References

#68921 - refactor(turbo-tasks): Make CURRENT_TASK_STATE shareable across multiple tokio tasks / threads, wait for detached work to finish before exiting a turbo task

Author

bgw

Parents

0378e946

next.js 226594f9 - refactor(turbo-tasks): Make CURRENT_TASK_STATE shareable across multiple tokio tasks / threads, wait for detached work to finish before exiting a turbo task (#68921)

next.js
226594f9 - refactor(turbo-tasks): Make CURRENT_TASK_STATE shareable across multiple tokio tasks / threads, wait for detached work to finish before exiting a turbo task (#68921)