perf: Fast path for shallow wildcard glob patterns in workspace discovery (#11972)
## Summary
Workspace package discovery uses glob patterns like
`packages/*/package.json` to find workspace packages. Previously, each
pattern was dispatched as a single rayon task running a sequential `wax`
directory walk. For large monorepos with hundreds of packages under one
directory, the slowest walker dominated the entire phase.
This PR adds a fast path for "shallow wildcard" patterns — globs with
exactly one `*` segment between literal prefix and suffix. Instead of a
full recursive directory walk, these are expanded via `readdir` +
parallel `stat` calls spread across all rayon workers.
## Problem
Profiling `turbo build --dry-run` on large monorepos revealed extreme
load imbalance in the glob walker phase:
| Glob pattern | Directory entries | Walk time |
|---|---|---|
| `packages/*/package.json` | ~600 | **54ms** |
| `services/*/package.json` | ~320 | **41ms** |
| `apps/*/package.json` | ~26 | 3ms |
| Other patterns | 1–30 | <1ms |
The slowest walker gates the entire `parse_package_jsons` phase. The
54ms walker runs sequentially on one rayon thread while other threads
sit idle after finishing in <1ms.
## Approach
`walk_compiled_globs` now partitions include patterns into three
categories:
1. **Invariant** (no wildcards) — single `stat` syscall (existing
optimization)
2. **Shallow wildcard** (`<prefix>/*/<suffix>`) — `readdir` + parallel
`stat` (**new**)
3. **General variant** — full `wax` directory walk (existing fallback)
`try_decompose_shallow_wildcard` detects eligible patterns:
- Rejects `**`, `?`, `[`, `{` metacharacters
- Requires exactly one bare `*` segment with a non-empty literal suffix
- Falls back to the full walk for anything it can't handle
## Results
Profiled with `--profile` on three monorepos of varying sizes (5, ~120,
and ~1,200 packages):
**Glob walk phase (workspace discovery):**
| Metric | Before | After |
|---|---|---|
| Slowest `walk_glob` (large repo) | 54ms | 6ms |
| `walk_glob` count (large repo) | 10 | 4 |
| `compile_globs` (large repo) | 20ms | 2ms |
| Slowest `walk_glob` (medium repo) | 14ms | 0.4ms |
**End-to-end `turbo build --dry-run` (5-run average, profile-internal
time):**
| Repo size | Baseline | With this change | Improvement |
|---|---|---|---|
| ~1,200 packages | 606ms | 455ms | **-25%** |
| ~120 packages | 308ms | 259ms | **-16%** |
Note: "With this change" includes other recent perf improvements on this
branch. The isolated improvement from this PR is ~60ms (~12%) on the
large repo.
## Testing
All 154 existing `globwalk` tests pass, including workspace-specific
tests that exercise exclusion patterns, nested packages, and edge cases.