cog
327a4908 - Install SDK from PyPI and refactor release publishing (#2691)

Commit
1 day ago
Install SDK from PyPI and refactor release publishing (#2691) * chore: update mise.lock with gotestsum * feat: configure coglet publishing to crates.io and PyPI Crates.io (coglet crate): - Add description, homepage, documentation, keywords, categories - coglet-python marked as publish = false (PyPI only) PyPI (coglet wheel): - Add pypi-coglet.yaml workflow for multi-platform wheel builds - Builds for: Linux x64, Linux ARM64, macOS x64, macOS ARM64 - Test PyPI on main branch pushes - Production PyPI on version tags - Crates.io publish on version tags Update coglet-python pyproject.toml with full PyPI metadata: - License, authors, keywords, classifiers, project URLs * ci: add release dry-run validation after CI complete - Runs cargo publish --dry-run for coglet crate - Runs goreleaser check to validate config - Only runs on PRs/main (not tags) - Actual release only runs on tags * feat: add release-build and release-publish workflows release-build.yaml (triggers on tags): - Verifies tag is on main branch - Builds SDK wheel (pure Python) - Builds coglet wheels for all 4 platforms - Creates draft GitHub release with artifacts release-publish.yaml (triggers when release is published): - Verifies publisher is different from tag author - Publishes SDK to PyPI - Publishes coglet wheels to PyPI - Publishes coglet crate to crates.io - Runs goreleaser for CLI binaries Security: - Two-person approval: tag pusher cannot publish - Environment secrets restricted to v* tags - Draft release acts as manual approval gate * chore: remove old pypi workflows Replaced by consolidated release-build and release-publish workflows. * refactor: remove tag handling from CI workflow Tags are now handled by release-build.yaml, so: - Remove tags from CI triggers - Remove tag-specific change detection - Remove tag conditions from build jobs - Keep release-dry-run for PR/main validation * feat: add optional coglet dependency to cog SDK Users can install the Rust prediction server with: pip install cog[rust] This is an alpha feature that will become the default once validated. * refactor: replace embedded SDK wheel with PyPI-based installation Remove go:embed and go:generate directives from the wheels package. The CLI now installs cog from PyPI by default instead of embedding the wheel at build time. New wheel resolution: - WheelSourcePyPI replaces WheelSourceEmbedded - Auto-detect dist/cog-*.whl for dev builds (Version == "dev") - Support for pypi, pypi:version, dist keywords - Separate GetCogWheelConfig() and GetCogletWheelConfig() Environment variables: - COG_WHEEL: pypi, pypi:0.12.0, dist, URL, or file path - COGLET_WHEEL: same pattern (optional, nil by default) * refactor: update dockerfile generator for PyPI-based cog installation - Replace installEmbeddedCogWheel with installCogFromPyPI - Add installCogletWheel, installCogletFromPyPI, installCogletFromURL, installCogletFromFile methods - Update tests to expect PyPI installation instead of embedded wheel - Remove tests for embedded wheel behavior * chore: remove go generate for wheel embedding - Remove generate:go task and all dependencies on it - Remove go generate hook from goreleaser - Update Makefile generate target - CLI can now be built independently without building SDK first * feat: rename cog[rust] to cog[coglet] with version constraint - Rename optional dependency from 'rust' to 'coglet' - Add version constraint: coglet>=0.1.0,<1.0 - Ensures SDK shim compatibility between cog and coglet releases * refactor: simplify release workflow security model - Remove two-person reviewer requirement - Use GitHub environment protection rules for maintainer-only access - Ensure coglet published to PyPI BEFORE SDK (dependency order) - Remove SDK wheel download (no longer embedded in CLI) * docs: add release workflow documentation to CI README - Document two-workflow release system (build + publish) - Add ASCII diagrams for release flow - Document SDK wheel sourcing and COG_WHEEL env var - Add GitHub environment setup instructions - Add step-by-step release guide * feat: lockstep coglet version constraint at build time Update pyproject.toml during SDK build to require coglet>=X.Y.Z matching the release tag version. This ensures cog and coglet releases are always compatible - SDK changes can depend on coglet features from the same release. * fix: run gotestsum directly in CI for proper cancellation handling Mise wraps commands in a way that prevents GHA from properly terminating test processes on workflow cancellation. Tests would continue running until timeout (30min) even after clicking cancel. Running gotestsum directly allows GHA's signal handling to work, so cancelled jobs terminate promptly. * ci: remove SDK build dependency from Go lint and test Go lint and test no longer need the SDK wheel since the CLI doesn't embed it. This allows Go jobs to run in parallel with SDK build, reducing CI time. * fix: detect snapshot versions as dev builds for wheel resolution Goreleaser sets Version to snapshot strings like '0.16.12-dev+g6793b492' not just 'dev'. These should not try to install from PyPI with version since they don't exist there. isDevVersion() now checks for: - 'dev' (exact match) - contains '-dev' (snapshot) - contains '+' (local version identifier) * fix: use process group trap for proper CI cancellation Use trap to catch INT/TERM and kill the entire process group. This ensures go test children are killed when GHA cancels the workflow. * fix: robust CI test cancellation with SIGTERM then SIGKILL Use job control (set -m) and process group signals for proper cleanup: 1. Trap INT/TERM signals 2. Send SIGTERM to process group 3. Wait 5 seconds for graceful shutdown 4. Send SIGKILL as backstop Also replace always() with !cancelled() in test-integration so that the job respects workflow cancellation requests. * fix: lint errors and use golangci-lint-action - Use 0o755/0o644 octal literals - Remove ineffectual assignments in ParseWheelValue - Use golangci/golangci-lint-action@v8 for better caching and annotations * chore: upgrade to golangci-lint v2 - Migrate .golangci.yaml to v2 format - Remove deprecated config options (deadline, skip-dirs, etc.) - Fix lint issues: duplicate imports, embedded field access, code simplifications - Exclude ST1005 (error capitalization) - 127 issues to fix incrementally - Add exclusions for defer cleanup patterns (RemoveAll, Close) * chore: add .coverage to gitignore * chore: use aqua backend for zig (faster install) * fix: use aqua rustup for consistent rust toolchain management Remove CARGO_HOME override and use aqua backend for rustup, which ensures consistent toolchain installation across developers. * ci: disable rustup and rustup-init in MISE_DISABLE_TOOLS * chore: use uv for python management instead of mise * chore: fix deprecated goreleaser options - archives.format -> archives.formats - archives.builds -> archives.ids * fix: restore original error string capitalization ST1005 is now globally disabled, so these changes were unnecessary and some broke integration tests. * chore: update coglet-python uv.lock * feat: improve wheel path resolution with REPO_ROOT and clear error messages - COG_WHEEL=dist and COGLET_WHEEL=dist now use REPO_ROOT env var (if set) or fall back to git rev-parse to find the repository root - Relative paths are converted to absolute paths - Clear error messages when: - Wheel file not found - Not in a git repo and REPO_ROOT not set - dist/ directory doesn't contain expected wheel - Add logging when using local or URL wheels - Add REPO_ROOT to mise.toml (respects existing value in CI) - Document COG_WHEEL and COGLET_WHEEL in docs/environment.md - Add integration tests for wheel resolution * feat: add coglet version validation and version-bump workflow CI changes: - Add version-check job that validates Cargo.toml version changes: - Valid semver format - Not an existing tag - Not a downgrade (must be >= highest released version) - Add version_only detection to skip heavy CI for version-bump-only PRs Release changes: - Remove sed version sync from release-build and release-publish - Version must be pre-set in crates/Cargo.toml before tagging - Release fails fast if Cargo.toml version != tag New version-bump workflow (workflow_dispatch): - Auto-bumps minor version if no input provided - Accepts patch or minor version input (major disallowed) - Creates PR with version bump and release instructions * feat: enforce branch rules for releases - Stable releases (v0.17.0) must be tagged from main branch - Pre-releases (v0.17.0-alpha1) must be tagged from prerelease/* branch - Cargo.toml version must match tag in both cases Combined with GitHub rulesets: - Only maintainers can create prerelease/* branches - Tags are immutable once created * chore: bump coglet version to 0.17.0 Align coglet version with cog release versioning (lockstep). * fix: propagate REPO_ROOT in integration test harness - Add REPO_ROOT to cmdCogServe env passthrough list - Add REPO_ROOT to pty-run env passthrough - Fix wheel_coglet_missing test to use explicit non-existent path instead of relying on coglet wheel not being built * fix: reduce healthcheck drain timeout to prevent CI timeouts After a healthcheck timeout, the orphaned thread (from uninterruptable time.sleep) may still hold stdout/stderr streams. The previous 10s drain timeout meant worst case was 5s (healthcheck) + 10s (drain) = 15s, which exceeded the 10s HTTP client timeout in integration tests. Reduce drain timeout to 1s for healthchecks. Worst case is now 5s + 1s = 6s, well within the 10s client timeout. * Skip slow its (#2694) * ci: skip slowest integration tests (~20min combined) Temporarily disable torch_baseimage_precompile (593s) and torch_baseimage_no_cog_base (585s) which together account for ~20min of the ~14min wall-clock IT run. These tests build torch base images without cog-base, pulling and installing everything from scratch. Will re-enable once ITs are sharded across multiple runners. * ci: shard integration tests across multiple runners Distribute integration tests across NUM_IT_RUNNER_SHARDS runners (default 4). Slow tests (tagged with [short] skip) are distributed round-robin first to ensure they don't pile up on one runner, then fast tests fill in. Each runtime (cog, cog-rust) gets its own set of shards, so total runners = NUM_IT_RUNNER_SHARDS * 2. The shard count is a single env var at the top of ci.yaml for easy tuning.
Author
Parents
Loading