Make Worker interface non-blocking
As step one in support for concurrent prediction execution (e.g. for
batching models), this change makes the `Worker` interface non-blocking,
bringing it closer to interfaces like
`concurrent.futures.ThreadPoolExecutor` which are doing a similar but
much more generic job.
The blocking interface this replaces was conceptually simpler, but given
that the worker was primarily used through an non-blocking HTTP
interface (the `Prefer: respond-async` header is what Replicate uses
when running Cog predictions in production) we had to bend over
backwards to use it. In particular, that meant:
- worker had to yield heartbeat events to return control to the caller
periodically
- we had to create another multi-threaded component, `PredictionRunner`,
to present a non-blocking interface over the top of the blocking
worker interface
In this commit, changes are restricted to `Worker`'s interface, and we
hack together whatever we need to in `PredictionRunner` to keep tests
passing. A future commit will replace the runner code altogether.
Both `setup` and `predict` now return `concurrent.futures.Future`
objects, which complete when the prediction is completed. Heartbeat
events are removed altogether.
Consumers of worker are expected to make use of its `subscribe` method
to allow them to receive all the events emitted during a setup or
predict run.
This also addresses an oversight that's been here since `Worker` was
first written: we now record something useful in the prediction logs if
a `BaseException` is raised by predict.
Co-Authored-By: F <f@replicate.com>