perf: Replace `libgit2` git status with `gix-index` for faster file hashing (#11950)
## Summary
Replaces `RepoGitIndex`'s libgit2-based `git ls-tree` + `git status`
with a new code path that reads the `.git/index` file directly via
`gix-index`. This eliminates the most expensive git operation in `turbo
run` by combining two separate libgit2 calls into a single index read +
parallel stat comparison.
## Results
**Profile data** (`RepoGitIndex::new`):
| Repo | libgit2 (before) | gix-index (after) | Improvement |
|---|---|---|---|
| Large (~500 packages, ~1700 tasks) | 397.8ms | 296.9ms | **-25%** |
**Wall-clock benchmarks** (hyperfine, `--dry --skip-infer`, 10+ warmup,
10+ runs):
| Repo | Speedup |
|---|---|
| Large (~500 packages) | **1.08-1.11x** |
| Medium (~120 packages) | **1.20-1.35x** |
| Small (~3 packages) | 1.00x |
Measured with `--profile` on three private repos of different sizes. All
profiles taken on the same machine, same base commit, clean working
trees.
The medium repo shows the biggest wall-clock improvement because git
operations are a larger fraction of total run time. The large repo has a
smaller relative improvement because other operations (engine build,
lockfile parsing, globwalk) dominate.
## Why
`git_status_repo_root` (via libgit2's `repo.statuses()`) was the single
most expensive operation in `turbo run`, consuming 30-70% of total
profiled duration depending on repo size. It stat-checks every tracked
file AND walks the entire working tree for untracked files in a
single-threaded C call.
## What Changed
**New gix-index code path** (`repo_index.rs`):
- Reads `.git/index` via `gix-index` (mmap'd, ~2-5ms) to get every
tracked file's blob OID and cached stat data
- Stats each tracked file in parallel via rayon, comparing filesystem
stat against index stat using `gix_index::entry::Stat::matches()`
- Racy-git entries (mtime >= index timestamp) are deferred to
per-package `hash_objects` instead of content-hashing inline — avoids
reading every file from disk on fresh checkouts
- Uses nanosecond timestamp precision (`use_nsec: true`) to reduce false
racy entries on modern filesystems (APFS, ext4)
- Detects untracked files via the `ignore` crate's parallel walker
(respects `.gitignore`)
- Falls back to the existing libgit2 path if gix-index fails
**Dependency changes:**
- Added `gix-index` as an optional dependency behind a `gix` feature
flag (~27 new crates, all pure Rust)
**Optimizations applied:**
- Removed redundant sort of `ls_tree_hashes` (git index is already
sorted, rayon preserves order)
- Deferred OID hex conversion — raw `ObjectId` carried through the
parallel loop, hex string allocated only for clean entries
- Binary search on sorted vecs instead of `HashSet` for untracked file
detection
**Test coverage:**
- 31 regression tests covering equivalence, edge cases (gitignore,
symlinks, prefix boundaries, racy-git), and contract guarantees (sorted
invariants, OID compatibility, determinism)
- Shared test utilities module (`test_utils.rs`)