WIP/RFC: Add `await` mechanism
# Introduction
This PR adds a new control flow mechanism called `await`. In this PR,
it is only exposed by the macro `@Base.Experimental.oc_await`, which
has the following docstring:
```
Base.Experimental.@oc_await [argt] [C->retblock]
Capture the current function's execution context for later resumption.
By default, immediately returns to the caller, returning an `OpaqueClosure` that
may be invoked to continue execution. If the optional `C->retblock` argument is
provided, `reblock` is executed in the context of the current function, with the
continuation bound to `C`. If `argt` is provided, the continuation will further
expect arguments `argt` to be provided when invoked.
```
Adding a feature like this was part of the original design of opaque closures,
but was never fully implemented for lack of immediate need. There are serveral
ways to think of this feature:
1. As an alternative representation for `:new_opaque_closure` that is
more friendly to other optimization passes.
2. An an implementation of a particular kind of delimited contiuation
3. As an implementation of C++20-style coroutines
The key important feature of this design is that the decision of which
values go in the capture list/residual/etc. is deferred all the way through
to the last possible moment in LLVM. As a result, all the ordinary optimizations
(DCE, SROA, various AD transforms, etc.) can be applied as usual across the
suspension boundary.
## Implementation status
As of the writing of this commit message, the implementation is minimal.
I've added lowering support, and the new IR node type, as well as support
in the interpreter to play with the semantics. However, there is no compiler
support or optimization support yet (so you need to run `julia --compile=min`
to play with it).
Semantic TODO:
- [ ] How does this mix with try/catch
- [ ] Does `await` capture other task-bound state,
- [ ] `scope` (yes?)
- [ ] locks? (no?)
- [ ] timing? (no?)
- [ ] rng? (no?)
Representational TODO:
- [ ] How does `argt` get represented in the continuation
Inference TODO:
- [ ] Implement AwaitNode inference support
Codegen TODO:
- [ ] Define and implement `julia.coro` intrinsics to lower this to
- [ ] Implement the appropriate lowering
Runtime TODO:
- [ ] Allow the OC captures to be allocated inline with a GC descriptor for pointers
## Detailed semantic discussion
### General semantic details
There are some semantic/similarities with try catch (in that they're both kinds of
continuations). However, the semantics are quite different:
1. Try/catch always jumps up the stack, `await` makes no assumptions (but copies
the state of the topmost stackframe, so there are two independent copies of it).
2. `await` is always delimited by `return` (which terminates the continuation).
3. `await` is multi-shot. However, I think single-shot is useful, so there is a
currently unused `flags` argument that might be used to ask for a single-shot
continuation.
### Syntax level
This adds a new syntax form `(symbolicawait continue_at argt flags)`.
`continue_at` is a label name created with `symboliclabel`. The semantics
are that the execution of `symbolicawait` captures all local slots and
ssa values and returns an opaque closure that, when-called, restores
all local slots and ssa values and resumes the execution at the label `continue_at`.
Regular execution continues as usual at the next statement after `symbolicawait`.
Modifications to slots (or ssavalues) after `symbolicawait` do not affect
the value of said slots/ssavlues in the continuation.
### IR level
This adds a new `AwaitNode`. It is in some ways similar structurally to
`EnterNode` in that it has a non-local successors, that may later be jumped to.
The non-local succsesor in both `AwaitNode` (i.e. the continuation) and `EnterNode`
(i.e. the catch block), is a statement/bb index integer inside the struct. However,
there are also some differences:
1. AwaitNode is always delimited by `ReturnNode`, there are no equivalent `:leave` or
`:pop_exception` statements.
2. `AwaitNode` returns a regular value (an opaque closure) not a token. `AwaitNode`
may be DCE'd if there are no uses.
### LLVM level [unimplemented]
The rough plan is to implement something similar to `llvm.coro`, although we cannot
use it directly, since we need special handling for our GC-tracked pointers. However,
we may be able to borrow some code.
## Potential users
I have the following potential use cases in mind immediately, although the
mechanism is of course quite general.
In Base:
1. `Task`
2. The futures mechanism in `Compiler`
In downstream packages:
1. The carried residual in reverse-mode AD packages like Diffractor or Enzyme
(I have no direct insight into Enzyme, but since the plan is to expose this
down to the LLVM level, I imagine it could use it).
2. Carried state between torn partitions in DAECompiler.