turbo
57cf69c2 - perf: Deduplicate file hashing and parallelize globwalks (#11902)

Commit
98 days ago
perf: Deduplicate file hashing and parallelize globwalks (#11902) ## Summary Optimizes `turbo run --dry` wall-clock time by up to 1.48x on large monorepos by eliminating redundant file hashing work and removing a serialization bottleneck in globwalk operations. ### Benchmarks Tested across three repos of varying size: | Repo | Packages | Before | After | Speedup | |------|----------|--------|-------|---------| | large | ~1000 | 5.903s | 3.999s | **1.48x** | | medium | ~120 | 1.461s | 1.380s | 1.06x | | small | ~6 | 0.659s | 0.693s | ~1.0x (noise) | The improvement scales with repo size — specifically with how many tasks share the same `(package, inputs)` combination. ### Changes **File hash deduplication** — Multiple tasks in the same package with identical `inputs` config (e.g. `build`, `lint`, `typecheck` all in one package) previously each ran an independent globwalk + file hash computation. Now tasks are grouped by `(package_path, globs, include_default)` and each unique combination is computed once, with results shared across tasks. **Parallel globwalks via retry-on-EMFILE** — The previous `IoSemaphore` (max=1) serialized all globwalk operations to prevent fd exhaustion, making this the dominant bottleneck on large repos. This replaces the semaphore with retry-with-exponential-backoff on `EMFILE` errors (the same pattern Node's `graceful-fs` uses), allowing globwalks to run fully parallel on rayon. If the OS returns "too many open files", the operation sleeps briefly and retries — up to 10 times with exponential backoff capped at 1s. **Zero-copy lockfile dependency lookups** — `Lockfile::all_dependencies` now returns `Cow<'_, HashMap<String, String>>` instead of cloning the HashMap on every call. For pnpm (which pre-builds a dependency index), this eliminates ~329k HashMap clones during transitive closure resolution. **Optimized transitive closure cache keys** — The `DashMap` resolve cache now uses a single null-byte-separated `String` key built into a reusable buffer, instead of allocating a `(String, String, String)` tuple per lookup. **HashMap importers for pnpm** — Converted pnpm's `importers` field from `BTreeMap` to `HashMap` (with sorted serialization) for O(1) workspace lookups during `resolve_package`.
Author
Parents
Loading