[turbopack] Optimize compaction cpu usage (#91468)
## Summary
Optimizes the turbo-persistence compaction and iteration paths with several targeted improvements:
### Iterator optimizations
- **Flatten index block iteration** — The iterator previously used a `Vec<CurrentIndexBlock>` stack, but SST files have exactly one index level. Inline the index block fields (`index_entries`, `index_block_count`, `index_pos`) directly into `StaticSortedFileIter`, eliminating the stack allocation and `Option` overhead.
- **Non-optional `CurrentKeyBlock`** — Parse the first key block during `try_into_iter()` construction so `current_key_block` is always populated, removing the `Option<CurrentKeyBlock>` wrapper and its `take()`/`Some()` ceremony in the hot loop.
- **Replace `ReadBytesExt` with direct byte indexing** — In `handle_key_match`, `parse_key_block`, and `next_internal`, replace `val.read_u16::<BE>()` etc. with `u16::from_be_bytes(val[0..2].try_into().unwrap())`. This eliminates th mutable slice pointer advancement.
- **Extract `read_offset_entry` helper** — Read type + offset from the key block offset table in a single `u32` load + shift, replacing two separate `ReadBytesExt` calls.
### Refcounting optimization
- **Introduce `RcBytes`** — Thread-local byte slice type using `Rc` instead of `Arc`, eliminating atomic refcount overhead during single-threaded SST iteration. The iteration path (`StaticSortedFileIter`) now produces `RcBytes` slices backed by an `Rc<Mmap>`, so per-entry clone/drop operations are plain integer increments rather than atomic operations.
### Merge iterator simplification
- **Optimize `MergeIter::next`** — Replaced the straightforwards `pop/push` pattern with `PeekMut`-based replace-top pattern, which means we only need to adjust the heap once per iteration instead of twice.
## Benchmark results
### Compaction (`key_8/value_4/entries_16.00Mi/commits_128`)
| Benchmark | Canary | Optimized | Change |
|-----------|--------|-----------|--------|
| partial compaction | 1.949 s | 1.515 s | **-22%** |
| full compaction | 2.051 s | 1.542 s | **-25%** |
### Read path (`static_sorted_file_lookup/entries_1000.00Ki`)
No read regression — the branch is neutral to slightly faster:
| Benchmark | Canary | Optimized | Change |
|-----------|--------|-----------|--------|
| hit/uncached | 6.73 µs | 6.59 µs | **-2%** |
| hit/cached | 140.8 ns | 130.7 ns | **-7%** |
| miss/uncached | 5.10 µs | 5.02 µs | **-2%** |
| miss/cached | 230.1 ns | 233.1 ns | ~+1% (noise) |
## Test plan
- [x] `cargo test -p turbo-persistence` — 60/60 tests passing
- [x] Compaction benchmarks run and compared against canary baseline
- [x] Read path (lookup) benchmarks verified no regression