[turbopack] Add support for fixed key blocks (#90844)
## What?
Adds fixed-size key block types to turbo-persistence SST files. When all entries in a key block share the same key size and value type, the 4-byte-per-entry offset table is eliminated entirely. Entry positions become a direct arithmetic calculation: `header_size + index * stride`.
Also fixes a bug in `sst_inspect` where value blocks were misidentified as key blocks (value blocks have no type header, so their raw data could coincidentally match a key block type byte).
## Why?
Many turbo-persistence tables have uniform key and value sizes. For example, TaskCache stores 4-byte task ID keys with 4-byte inline values — every single entry is identical in structure. The existing variable-size key block format stores a 4-byte offset table entry per entry (1B type + 3B position) to support variable-length entries. For uniform blocks, this offset table is pure overhead in both space and read-path indirection (binary search must chase offset table pointers at each step).
## How?
**New block types 3 and 4** (parallel to existing types 1/2 for hash/no-hash):
```
Variable: [1B type][3B count][4B offset × count][entries...]
Fixed: [1B type][3B count][1B key_size][1B value_type][entries...]
```
The 2 extra header bytes (key_size + value_type) are amortized across all entries since we save 4 bytes per entry from removing the offset table. Break-even at 1 entry.
**Writer changes:**
- `KeyBlockFormat` enum with state machine (`Unknown → Fixed → Variable`) tracks uniformity as entries are added to the accumulator
- `FixedKeyBlockBuilder` writes the compact 6-byte header + contiguous entries with no offset table
- Falls back to variable-size automatically when entries aren't uniform
- `ValueRef::write_value_to()` shared across both builder types to avoid duplication
**Reader changes:**
- `lookup_fixed_key_block()` — binary search using stride arithmetic (pure arithmetic, zero conditional branching per step)
- `get_fixed_key_entry()` — direct index calculation instead of offset table indirection
- Iterator refactored with `CurrentKeyBlockKind` enum (Variable vs Fixed variants)
**sst_inspect fix:** Reads the index block first to determine which block indices are key blocks, rather than guessing from the first byte of block data.
### Real-world impact (vercel-site .next cache, ~9.5M entries per family)
| Family | Fixed Blocks | Variable Blocks | Key Block Size | Notes |
|--------|-------------|----------------|---------------|-------|
| **TaskCache** | **16,274 (100%)** | 0 | **108.82 MB** (was 145 MB, **-25%**) | All inline 4B values |
| TaskMeta | 10,078 (72%) | 3,877 (28%) | 118.86 MB (was ~145 MB) | Variable blocks contain rare medium values |
| TaskData | 39 (0.3%) | 13,943 (99.7%) | 144.45 MB | Medium values spread across most blocks |
| Infra | 0 | 1 | 25 B | Mixed inline sizes |
This also saves 73M (2%) of the overall cache size
### Read-path benchmarks (`static_sorted_file_lookup`, 8B keys + 4B values)
| Benchmark | Canary | Fixed Blocks | Change |
|---|---|---|---|
| **1Ki hit/cached** | 534 ns | 510 ns | **-4.5%** |
| **10Ki hit/cached** | 559 ns | 556 ns | ~0% |
| **100Ki hit/cached** | 566 ns | 503 ns | **-11.1%** |
| **1Mi hit/cached** | 573 ns | 519 ns | **-9.4%** |
| 1Ki miss/cached | 571 ns | 571 ns | ~0% |
| 10Ki miss/cached | 613 ns | 556 ns | **-9.3%** |
| 100Ki miss/cached | 639 ns | 593 ns | **-7.2%** |
| 1Mi miss/cached | 791 ns | 702 ns | **-11.3%** |
| 1Ki hit/uncached | 4.07 µs | 3.81 µs | **-6.4%** |
| 10Ki hit/uncached | 5.33 µs | 5.03 µs | **-5.6%** |
| 100Ki hit/uncached | 5.55 µs | 5.24 µs | **-5.6%** |
| 1Mi hit/uncached | 8.47 µs | 8.31 µs | **-1.9%** |
| 1Ki miss/uncached | 3.37 µs | 3.19 µs | **-5.3%** |
| 10Ki miss/uncached | 3.87 µs | 3.41 µs | **-11.9%** |
| 100Ki miss/uncached | 4.15 µs | 3.70 µs | **-10.8%** |
| 1Mi miss/uncached | 6.98 µs | 6.66 µs | **-4.6%** |
Consistent **5-11% improvement** on cached lookups (the hot path in production), where the binary search dominates and offset table elimination matters most. Uncached lookups also improve 2-12% due to smaller blocks and faster binary search.