next.js
f8f77da9 - turbo-tasks-backend: batch schedule dirty tasks in aggregation_update (#91461)

Commit
35 days ago
turbo-tasks-backend: batch schedule dirty tasks in aggregation_update (#91461) ### What? Batch the scheduling of dirty tasks during aggregation update processing instead of scheduling them one at a time. In `aggregation_update.rs`, `find_and_schedule_dirty_internal` previously called `ctx.schedule()` immediately for each task it decided to schedule. This involved a separate `ctx.task(id, TaskDataCategory::All)` call per task to read `leaf_distance` and `has_output` before calling `turbo_tasks.schedule(id, priority)`. ### Why? Each individual `ctx.task(...)` call may require restoring task data from persistent storage (disk I/O). When many tasks need scheduling — e.g. during a large invalidation wave, or initially when all session-dependent tasks are scheduled — doing this one-at-a-time is inefficient. `for_each_task_all` (via `prepare_tasks_with_callback`) can batch the restoration of persistent task data, amortizing I/O overhead across all tasks in the batch. ### How? - Added a `scheduled_tasks: FxHashMap<TaskId, TaskPriority>` field to `AggregationUpdateQueue` (skipped in bincode serialization — transient state that doesn't survive suspend/resume). - `find_and_schedule_dirty_internal` now inserts `(task_id, parent_priority)` into `scheduled_tasks` instead of calling `ctx.schedule()` directly. - At the end of `process()` (in the final `else` branch, when all sub-queues are exhausted), if `scheduled_tasks` is non-empty, `ctx.for_each_task_all` is called to batch-fetch the tasks and schedule them, then `scheduled_tasks` is cleared. This maximizes batch size by deferring all scheduling to the end of the operation. - `scheduled_tasks` is included in `is_empty()` (with exhaustive struct destructuring) so the compiler enforces the check stays up to date. --------- Co-authored-by: Tobias Koppers <sokra@users.noreply.github.com> Co-authored-by: Claude <noreply@anthropic.com>
Author
Parents
Loading