Send serialized ASTs to parallel workers (#20991)
This way we can properly benefit from native parser in parallel type
checking. Self-check is now ~2.2x faster with 5 workers compared to
in-process checking (also with native parser). Also it uses less memory,
but still with 5 workers, self-check takes ~twice more memory compared
to in-process.
Implementation is mostly straightforward. The GC freeze hack needed some
tuning, as there is no single hot-spot in terms of allocations anymore.
Note: do _not_ use `maturin develop` for any performance measurements,
as this creates some very slow wheel.