next.js
8e9f9514 - [turbopack] Optimize compaction cpu usage (#91468)

Commit

38 days ago

[turbopack] Optimize compaction cpu usage (#91468) ## Summary Optimizes the turbo-persistence compaction and iteration paths with several targeted improvements: ### Iterator optimizations - **Flatten index block iteration** — The iterator previously used a `Vec<CurrentIndexBlock>` stack, but SST files have exactly one index level. Inline the index block fields (`index_entries`, `index_block_count`, `index_pos`) directly into `StaticSortedFileIter`, eliminating the stack allocation and `Option` overhead. - **Non-optional `CurrentKeyBlock`** — Parse the first key block during `try_into_iter()` construction so `current_key_block` is always populated, removing the `Option<CurrentKeyBlock>` wrapper and its `take()`/`Some()` ceremony in the hot loop. - **Replace `ReadBytesExt` with direct byte indexing** — In `handle_key_match`, `parse_key_block`, and `next_internal`, replace `val.read_u16::<BE>()` etc. with `u16::from_be_bytes(val[0..2].try_into().unwrap())`. This eliminates th mutable slice pointer advancement. - **Extract `read_offset_entry` helper** — Read type + offset from the key block offset table in a single `u32` load + shift, replacing two separate `ReadBytesExt` calls. ### Refcounting optimization - **Introduce `RcBytes`** — Thread-local byte slice type using `Rc` instead of `Arc`, eliminating atomic refcount overhead during single-threaded SST iteration. The iteration path (`StaticSortedFileIter`) now produces `RcBytes` slices backed by an `Rc<Mmap>`, so per-entry clone/drop operations are plain integer increments rather than atomic operations. ### Merge iterator simplification - **Optimize `MergeIter::next`** — Replaced the straightforwards `pop/push` pattern with `PeekMut`-based replace-top pattern, which means we only need to adjust the heap once per iteration instead of twice. ## Benchmark results ### Compaction (`key_8/value_4/entries_16.00Mi/commits_128`) | Benchmark | Canary | Optimized | Change | |-----------|--------|-----------|--------| | partial compaction | 1.949 s | 1.515 s | **-22%** | | full compaction | 2.051 s | 1.542 s | **-25%** | ### Read path (`static_sorted_file_lookup/entries_1000.00Ki`) No read regression — the branch is neutral to slightly faster: | Benchmark | Canary | Optimized | Change | |-----------|--------|-----------|--------| | hit/uncached | 6.73 µs | 6.59 µs | **-2%** | | hit/cached | 140.8 ns | 130.7 ns | **-7%** | | miss/uncached | 5.10 µs | 5.02 µs | **-2%** | | miss/cached | 230.1 ns | 233.1 ns | ~+1% (noise) | ## Test plan - [x] `cargo test -p turbo-persistence` — 60/60 tests passing - [x] Compaction benchmarks run and compared against canary baseline - [x] Read path (lookup) benchmarks verified no regression

Author

lukesandberg

Parents

af97e756

next.js 8e9f9514 - [turbopack] Optimize compaction cpu usage (#91468)

next.js
8e9f9514 - [turbopack] Optimize compaction cpu usage (#91468)