turbo
fc19b667 - perf: Reduce per-package hashing overhead and eliminate SCM subprocesses (#11942)

Commit
1 day ago
perf: Reduce per-package hashing overhead and eliminate SCM subprocesses (#11942) ## Summary Follow-up to #11938. Targets the per-package hashing hot path that dominates at scale, plus eliminates the last two git subprocesses from `--dry` runs. ### Small repo (~6 packages) | | Mean | Range | |---|---|---| | **This PR** | 571.2ms ± 46.7ms | 515.6ms - 651.7ms | | **main** | 587.4ms ± 45.1ms | 524.9ms - 676.3ms | | | **1.03 ± 0.12x faster** | | ### Medium repo (~120 packages) | | Mean | Range | |---|---|---| | **This PR** | 1.096s ± 0.095s | 1.015s - 1.280s | | **main** | 1.119s ± 0.072s | 1.042s - 1.259s | | | **1.02 ± 0.11x faster** | | ### Large repo (~1000 packages) | | Mean | Range | |---|---|---| | **This PR** | 1.729s ± 0.151s | 1.548s - 1.969s | | **main** | 1.833s ± 0.181s | 1.583s - 2.099s | | | **1.06 ± 0.14x faster** | | The small repo results best isolate the fixed-cost improvements (git2 for branch/SHA, reduced allocation overhead) since per-package work is minimal. At larger scales, the improvements are present but within noise because wall-clock time is already well-parallelized across rayon threads. ## Benchmarks All benchmarks: `turbo run <task> --skip-infer --dry`, 5 warmup + 10 measured runs, release build. ## Changes - **FileHashes: HashMap to sorted Vec** — `FileHashes` inner type changed from `HashMap` to pre-sorted `Vec`. Eliminates HashMap construction (hashing, bucket allocation, rehashing) in the per-package hashing pipeline and removes redundant re-sorting in Cap'n Proto serialization. The sort happens once at the construction boundary; downstream consumers (`expanded_inputs`, `.hash()`) get pre-sorted data for free. - **Status entry binary search** — `get_package_hashes` now uses `partition_point` on pre-sorted status entries instead of a linear scan. Reduces per-package status lookup from O(dirty_files) to O(log(dirty_files) + matched). Also adds `with_capacity` to the per-package HashMap to avoid rehashing. - **git2 for branch/SHA** — `get_current_branch` and `get_current_sha` (called by `SCMState::get` in `to_summary`) now use `git2::Repository` instead of forking `git branch --show-current` and `git rev-parse HEAD`. Gated behind `#[cfg(feature = "git2")]` with subprocess fallback.
Author
Parents
Loading