fix: eliminate dual prediction state, wire webhooks from single source of truth (#2780)
* fix: eliminate dual prediction state, wire webhooks from single source of truth
Collapse PredictionSupervisor into PredictionService to eliminate the
dual state tracking that caused terminal webhooks to have empty logs
and intermediate webhooks to never fire.
Before: PredictionState (supervisor) was a stale copy that never received
logs or outputs. Webhooks fired from supervisor had empty payloads.
After: Prediction (orchestrator) is the single source of truth.
Webhook methods fire directly from Prediction mutation methods
(set_processing, set_succeeded, append_log, append_output, etc.)
where real-time state lives.
Also applies block_in_place fix for log forwarder starvation — tells
tokio the predict thread will block on GIL, allowing log_forwarder
tasks to be scheduled on other threads instead of batching at end.
Net: -176 lines, supervisor.rs deleted entirely.
* fix: include streaming outputs in GET response, exempt Output webhooks from throttling
- get_prediction_response() now falls through to streaming outputs vec
(same logic as webhook payload builder) so GET /predictions/:id shows
intermediate output during prediction
- Output webhook events bypass the 500ms throttle: they are high-value,
infrequent, and were effectively unthrottled in the old Python runtime
where file uploads were synchronous