[turbopack] shrink the size of futures (#93474)
### What?
Move the `RawVc::to_non_local` step out of every per-task future and into the executor's outer state machine, after `wait_for_local_tasks().await`. Inline the now-trivial `output_try_into_non_local_raw_vc` helper at the six `task_fn_impl!` callsites.
### Why?
Two wins, both small per task but compounding across the millions of task invocations a real workload (e.g. `next build`) makes.
**1. Smaller per-task futures.** On canary, every `#[turbo_tasks::function]` future carried an `output_try_into_non_local_raw_vc` suspension state (a 152-byte `__awaitee` containing `RawVc::to_non_local()`'s 128-byte awaitee) regardless of arity, sync/async, or whether the user body had any internal awaits. Measured with `-Zprint-type-sizes` on representative noop tasks:
| Task | canary | this branch | shrink |
|---|---|---|---|
| `noop_arity0` (sync) | 160 B | **1 B** | -99% |
| `noop_arity0_async` | 160 B | **32 B** | -80% |
| `noop_arity1` (sync) | 168 B | **16 B** | -90% |
| `noop_arity1_async` | 168 B | **56 B** | -67% |
| `noop_arity2` (sync) | 176 B | **24 B** | -86% |
| `noop_arity2_async` | 176 B | **80 B** | -55% |
| `busy_turbo` (sync, `u64`+`Duration`) | 184 B | **32 B** | -83% |
For sync `FunctionMode`/`MethodMode` impls, removing the only awaiting work collapses the state machine to a no-suspension shape (`Unresumed | Returned | Panicked`); the 1-byte size for sync arity-0 is just the discriminant. Smaller futures mean smaller `Box::pin` allocations (often a smaller jemalloc bucket) and less memory traffic on first poll.
**2. One fewer `turbo_tasks()` call per task execution.** The async `RawVc::to_non_local` did `let tt = turbo_tasks();` — a thread-local lookup — on every per-task future poll that touched a local output. The new `to_non_local_unchecked_sync` takes `&dyn TurboTasksApi` directly from the executor scope where the reference is already in hand.
### How?
- `output_try_into_non_local_raw_vc` is no longer async; it just returns `output.try_into_raw_vc()`. Inlined at all six callsites in `task_fn_impl!` and deleted.
- `RawVc::to_non_local_unchecked_sync` is added: a synchronous variant that takes `&dyn TurboTasksApi` and assumes the local output is already populated. The executor calls it after `wait_for_local_tasks().await`, where that invariant holds. The `unreachable!()` on the not-ready path documents the contract.
- The original async `to_non_local` is kept (now `pub`) so the testing harness can do the conversion explicitly, since it doesn't go through the same executor path.
- `NativeFunction::execute`'s doc gains a note that it may now return a local Vc.
The `to_non_local` semantics are unchanged — just relocated.
<!-- NEXT_JS_LLM_PR -->