[turbo-trace-server] optimize loading even more (#93332)
## What
Reduce time and memory spent loading large traces in
`turbopack-trace-server`, focused on the row-ingestion hot path
(`reader/turbopack.rs`).
**Eager interning at parse time**
- New `OwnedTraceValue` enum (lifetime-free) and `SpanArgs` reuse for
`InternalRow` value lists. All borrowed `Cow<str>` keys/values are
interned to `RcStr` once, at parse time, via a single `intern_span_args`
helper.
- `InternalRow`/`InternalRowType` shed their `'a` parameter. The
`into_static` path is gone — previously, every row that had to be queued
waiting for its parent allocated a fresh `String` for each Cow field,
then the eventual processor re-interned those strings. Now both costs
are gone: one intern, no String allocation.
- `Event` rows pre-extract `duration` (from `TraceValue::UInt`) and
`name` (from `TraceValue::String`) at parse time, so the per-event
`FxIndexMap` `swap_remove` lookup goes away entirely.
**Inline queue flush instead of working-queue extend**
- `process_internal_row` and `process_internal_row_queue` collapsed into
a single recursive method. When a `Start` row flushes queued children
(i.e. orphaned rows whose parent has now appeared), those rows are
processed inline via direct recursion instead of being `extend`ed into a
working queue and re-driven. Eliminates the `Vec<InternalRow>::extend`
memcpy that showed up as a top memmove site, and keeps the just-added
`Span` hot in cache while its child events are applied.
- `row_queue` field and the `take`/swap driver loop on `TurbopackFormat`
removed.
**New ChunkedVec storage for Span**
- reduces costs when adding Span objects, since they are large and
expensive to move
- reduces peak heap due to resizing (in exchange for larger min sizes)
## Result
Loading a 10 GB trace: **53s → 45s** wall clock, **21.5 GB → 18.8 GB**
peak heap. Hitting 220mb/s loading speed
<!-- NEXT_JS_LLM_PR -->