Backport cog-runtime (#2583)
* initial commit of imported cog-runtime repo
* Integrate cog-runtime (coglet) into main repository
Move coglet runtime code from standalone cog-runtime repository into coglet/ directory:
- coglet/cmd/coglet-server/ - Go binary for container runtime
- coglet/internal/ - Go server packages (runner, server, webhook, etc.)
- coglet/python/coglet/ - Python SDK
- coglet/python/cog/ - Compatibility shim
- coglet/pyproject.toml - Python package config
Changes:
- Bump go.mod to go 1.25 (required for sync.WaitGroup.Go, os.Root features)
- Add Makefile targets: coglet-wheel, coglet-server-binaries, test-coglet-*
- Add CI jobs for coglet Go and Python tests
- Update import paths from github.com/replicate/cog-runtime/internal/*
to github.com/replicate/cog/coglet/internal/*
Original code from: https://github.com/replicate/cog-runtime
Phase 1 of cog-runtime integration (cog-l09 epic, cog-4gd task)
* fix: remove readme field from coglet pyproject.toml
The readme = '../README.md' path breaks CI because setuptools
doesn't allow referencing files outside the package directory.
* feat: use uv run for coglet Go tests instead of manual venv
Replace PythonBinPath config with PythonCommand []string to support
flexible Python invocation. This allows tests to use 'uv run --project'
which automatically manages the Python environment.
Changes:
- config: Replace PythonBinPath string with PythonCommand []string
- manager: Add buildPythonCmd() helper for command construction
- harness_test: Use 'uv run --project' for both coglet and legacy cog tests
- Remove manual .venv path construction and PYTHONPATH manipulation
- Add coglet/uv.lock for reproducible test environments
- Ignore auto-generated _version.py files from setuptools-scm
This eliminates the need for manual venv setup before running tests.
* fix: pre-create coglet venv in CI to avoid parallel uv sync
The coglet Go tests use 'uv run --project coglet' which creates
a venv on first run. When multiple tests start in parallel, they
may all try to create the venv simultaneously, causing hangs or
timeouts.
Pre-running 'uv sync' ensures the venv exists before tests start.
* fix: exclude coglet/ from test-go target to avoid CI hangs
The test-go CI job was including coglet/internal/tests which requires
uv to be set up for Python environment management. Since test-coglet-go
already runs these tests with proper uv setup, exclude all coglet/
packages from test-go to avoid 20-minute hangs in CI.
* fix: clean up pending map before sending webhook to avoid race condition
When a prediction completes, the terminal webhook was sent before
the pending map entry was deleted. This caused a race condition where
a webhook receiver starting a new prediction would see the runner
as having no capacity (pending entry still exists), leading to 500
errors in sequential prediction scenarios.
Reorder operations to delete from pending map first, then send webhook.
This ensures findRunnerWithCapacity sees accurate capacity when new
predictions arrive.
* fix: limit coglet test parallelism to avoid resource exhaustion
The coglet tests spawn Python subprocesses for each test case. Running
too many in parallel causes resource exhaustion (OOM kill) in CI.
Limit parallelism to 4 to prevent this.
* fix: resolve golangci-lint errors in coglet code
- Add gosec nolint directives for trusted subprocess and HTTP calls
- Refactor handlePath() to use type switch instead of if-else chain
- Fix regex match validation to check len() instead of nil
* fix: prevent test cleanup from killing test process group
The killAllChildProcesses() function was using pgrep -f "coglet" which
matched the test binary path itself. When combined with syscall.Kill(-pid,
SIGKILL) which kills entire process groups, this was terminating the test
process during cleanup.
Added checks to:
- Skip processes in the same process group as the test
- Skip the parent process (ourPpid)
Also restored the original gotestsum test runner format.
* fix: add CodeQL security annotations for coglet
Add #nosec annotations to suppress CodeQL false positives for:
- G304 (path traversal): procedure.go:62, runner.go:866, runner.go:873
- G107 (SSRF): procedure.go:79, webhook.go:57
These are intentional behaviors in the coglet runtime which runs in
isolated containers with inputs from trusted orchestration systems.
Added TODO[md] comments for future validation improvements.
* chore: simplify SSRF TODO annotations in coglet
Remove #nosec annotations (CodeQL doesn't use them) and simplify TODO
comments for future SSRF protection work. These alerts will be dismissed
in GitHub UI since URLs come from trusted orchestration layer.
* fix: use os.Root API for traversal-safe file operations in coglet
Refactor coglet path operations to use Go 1.24's os.Root API to prevent
path traversal attacks. This addresses CodeQL path injection alerts.
Changes:
- Add workingRoot field to RunnerContext for scoped file operations
- Add WriteFile/StatFile helper methods with fallback for tests
- Initialize workingRoot in manager.go when creating runners
- Close workingRoot in RunnerContext.Cleanup()
- Add path validation for file:// URLs in procedure.go using
filepath.Clean and filepath.EvalSymlinks
The os.Root API ensures file operations cannot escape the working
directory even with malicious path inputs like '../../../etc/passwd'.
* Revert "fix: use os.Root API for traversal-safe file operations in coglet"
This reverts commit ac6e3ba5413a424cdd5fc12b8b057849a1c32607.
* fix: add CodeQL suppression comments and config for path injection alerts
Add inline suppression comments with detailed rationale for known false
positive path injection alerts in coglet:
- runner.go: requestPath is constructed from controlled workingdir and
internally-generated prediction ID (not user input)
- procedure.go: file:// URLs only used in dev/testing, production uses
http/https from trusted sources
Also adds .github/codeql-config.yml documenting these decisions and
excluding test fixtures from analysis.
* Revert "fix: add CodeQL suppression comments and config for path injection alerts"
This reverts commit 798c412e348f7d407f8c376d3008949cf938e456.
---------
Co-authored-by: Michael Dwan <mdwan@cloudflare.com>