turbo-tasks-backend: batch find_and_schedule_dirty using for_each_task_meta (#91497)
### What?
Batch-process `find_and_schedule_dirty` in `aggregation_update.rs` by
collecting all queued jobs (up to `FIND_AND_SCHEDULE_BATCH_SIZE` = 10
000) into a `SmallVec` and pre-fetching their task metadata with a
batched `ctx.for_each_task_meta(...)` call.
### Why?
`find_and_schedule` can accumulate thousands of tasks during
invalidation cascades. The previous implementation issued one
`ctx.task(...)` call per task inside `process()`, serializing
backing-storage fetches one at a time.
`ctx.for_each_task_meta` triggers a batched fetch of all task metadata
from the backing store: keys are sorted by hash for cache-friendly
sequential access to the storage layer. The callback is invoked per-task
once data is ready, with the task guard handed directly to the callback
— no second lock acquisition needed.
The batch limit is set to 10 000, because find-and-schedule jobs are
cheap (metadata read + optional schedule) compared to aggregation-update
jobs, so yielding less often is safe and beneficial.
### How?
**`fn process` — `find_and_schedule` branch:**
- Replace the one-at-a-time loop with a single
`drain(..FIND_AND_SCHEDULE_BATCH_SIZE)` that collects up to 10 000
`FindAndScheduleJob` structs into a `SmallVec`, then calls
`find_and_schedule_dirty` once with the whole batch.
**`fn find_and_schedule_dirty`:**
- Change parameter from `task_id: TaskId` to `jobs:
SmallVec<[FindAndScheduleJob; 4]>`.
- Call `ctx.for_each_task_meta(...)` to batch-prefetch all task metadata
(sorted by hash for cache-friendly access) and process each task in the
callback.
**`prepare_tasks_with_callback` fix:**
- Release `TaskLockCounter` *before* calling `prepared_task_callback`
instead of after. This ensures the counter is 0 when the callback runs,
so callbacks that drop their task guard and then call `ctx.task()` (like
`find_and_schedule_dirty_internal` → `ctx.schedule()`) no longer trigger
the "Concurrent task lock acquisition detected" panic in debug builds.
- `for_each_task` now uses `acquire()` instead of `reacquire()` since
counter is guaranteed 0 at callback entry. `reacquire()` is removed.
**Cleanup:**
- Use `FxHashMap` (already imported) instead of
`std::collections::HashMap`.
- Combine consecutive `#[cfg(trace_find_and_schedule)]` let bindings for
clarity.
- Add `FIND_AND_SCHEDULE_BATCH_SIZE = 10_000` as a self-contained
constant (not derived from `MAX_COUNT_BEFORE_YIELD`).
---------
Co-authored-by: Tobias Koppers <sokra@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>