turbo
f03cdce2 - perf: Stream file contents during hashing to lower memory usage (#12059)

Commit
10 days ago
perf: Stream file contents during hashing to lower memory usage (#12059) ## Summary - Both `hash_file` (gix path) and `git_like_hash_file` (manual fallback) previously called `std::fs::read()` / `read_to_end()`, loading entire files into memory before hashing. When rayon parallelizes hashing across many large files, this can OOM memory-constrained environments. - Now both paths stat the file for its size, write the git blob header into the hasher, then stream through a 64KB `BufReader`. Peak memory per hash call is bounded regardless of file size. - Hash output is identical — verified by tests comparing against `git hash-object`. ## What changed **`crates/turborepo-scm/src/hash_object.rs`** — `hash_file()` now uses `gix_index::hash::hasher` + `gix_object::encode::loose_header` to build the hasher with the blob header, then streams via `BufReader` instead of `std::fs::read`. **`crates/turborepo-scm/src/manual.rs`** — `git_like_hash_file()` writes the blob header using the file size from metadata, then streams through the `sha1::Sha1` hasher in 64KB chunks instead of `read_to_end`. ## Testing - Extended `test_blob_hash_matches_git_hash_object` with 128KB (multi-buffer) and 64KB (exact-buffer-boundary) cases. - Added `test_manual_hash_matches_git_hash_object` — the manual path previously had no test verifying hash correctness against `git hash-object`. This new test covers the same edge cases including streaming buffer boundaries.
Author
Parents
Loading