experimental.prefetchInlining: bundle segment prefetches into a single response (#90555)
## Background
Next.js 16 introduced per-segment prefetching through the Client Segment
Cache. Rather than fetching all data for a route in a single request,
the client issues individual requests for each segment in the route
tree. This design improves cache efficiency: shared layouts between
sibling routes (e.g., /dashboard/settings and /dashboard/profile sharing
a /dashboard layout) are fetched once and reused from the client cache,
avoiding redundant data transfer.
The trade-off is request volume. A route with N segments now produces N
prefetch requests instead of one. Users upgrading from older versions
notice significantly more network activity in their devtools, even
though the total bytes transferred may be similar or lower due to
deduplication.
Per-segment fetching is still a reasonable default for many sites. These
prefetch requests are served from cache, they're fast, and they run in
parallel. The main scenario where the trade-off breaks down is
deployment environments that charge per-request. But even setting aside
cost, there is a theoretical performance threshold where very small
segments are better off inlined — the per-request overhead (connection
setup, headers, scheduling) exceeds the cost of transferring duplicate
bytes. This is analogous to JS bundlers, which inline small modules
rather than creating separate chunks, because the overhead of an
additional script tag or dynamic import outweighs the bytes saved.
## What this change does
This adds `experimental.prefetchInlining`, a boolean option in
next.config.js. When enabled, the server bundles all segment data for a
route into a single `/_inlined` response rather than serving each
segment individually. The tree prefetch (`/_tree`), which provides route
structure metadata, remains a separate request — but optimistic routing
(#88965) eliminates that request entirely by predicting the route
structure client-side. With both features enabled, prefetching is
effectively one request per link.
The fundamental trade-off is straightforward: inlining reduces request
count at the cost of deduplication. Each inlined response includes its
own copy of any shared layout data, so two sibling routes will each
transfer the shared layout rather than sharing a single cached copy.
This is the same trade-off that compilers and bundlers face when
deciding whether to inline a function: inlining eliminates the overhead
of indirection (here, extra HTTP requests) but increases total size when
the same data appears in multiple call sites.
## Future direction
The boolean flag is a stepping stone. The observation that there is a
natural size threshold below which inlining is strictly better — where
per-request overhead dominates the cost of any duplicate bytes — points
toward a size-based heuristic, analogous to how compilers choose an
inlining threshold. Small segments would be inlined automatically;
segments exceeding a byte threshold would be "outlined" into separate
requests where deduplication can take effect. For most applications,
this would require no configuration. For applications with specific
latency or bandwidth constraints, an option to adjust the threshold
would let developers tune their position on the requests-vs-bytes curve.
Adaptive heuristics based on network conditions are also possible,
though further out.