cog
e628d786 - chore(coglet): Rust safety improvements and wheel platform fix (#2717)

Commit
9 days ago
chore(coglet): Rust safety improvements and wheel platform fix (#2717) * chore(coglet): pass transport info from Init * chore(coglet): add Fatal IPC message and panic hook for worker shutdown Any panic in the worker (including mutex poisoning) now: - sends ControlResponse::Fatal { reason } to parent via panic hook - parent poisons all slots and fails in-flight predictions - worker aborts (no unwinding through FFI/Python) .expect() at lock sites is the correct idiom: the panic hook handles IPC notification and hard abort automatically. * chore(coglet): move slot poisoning to pool level Slot poisoning is a property of the slot, not the prediction. A slot can be poisoned regardless of whether a prediction is active on it. - Add PermitPool::poison(slot_id) with per-slot AtomicBool flags - try_acquire/acquire skip poisoned permits (lazy filtering) - PermitIdle::drop checks poison flag, skips returning to pool - Orchestrator poisons slots directly on Failed/Fatal via pool - Remove slot_poisoned from Prediction (wrong abstraction) - Remove PredictionSlot::into_poisoned (service always uses into_idle) - Service send-failure path poisons at pool level before into_idle * chore(coglet): coalesce concurrent healthcheck requests Replace the semaphore + fake-healthy-when-in-progress pattern with proper request coalescing. All concurrent callers now wait on the same in-flight healthcheck and receive the real result. - Remove healthcheck_semaphore from OrchestratorHandle - Event loop uses Vec<Sender> instead of Option<Sender> - Only one ControlRequest::Healthcheck sent to worker at a time - Result broadcast to all waiters when it arrives - Timed-out callers drop silently; healthcheck continues for others - Fatal/crash paths fail all pending healthcheck waiters * fix(cog): select coglet wheel by Docker target platform The coglet wheel is a native Rust extension (platform-specific), but wheel selection had no platform awareness — it globbed coglet-*.whl and took the first match alphabetically. On ARM Mac with both wheels in dist/, the macOS wheel sorted before the manylinux wheel and was incorrectly picked for a Linux Docker build. Additionally, StandardGenerator.GOOS/GOARCH were initialized from runtime.GOOS (both — copy-paste typo), giving "darwin"/"darwin" on Mac instead of the Docker build target "linux"/"amd64". Changes: - Add wheelPlatformTag() mapping GOARCH to wheel platform substring - Add filterWheelsByPlatform() to filter wheel paths by platform tag - Thread platformTag through findWheelInDist/findWheelInDistSilent - GetCogletWheelConfig now takes targetArch parameter - Fix StandardGenerator GOOS/GOARCH to match Docker target (linux/amd64) - Cog SDK wheel (py3-none-any) passes empty platformTag (no filtering)
Author
Parents
Loading