cog
471de5b7 - Add Go-based integration test framework using testscript (#2622)

Commit
135 days ago
Add Go-based integration test framework using testscript (#2622) * Add Go-based integration test framework using testscript Set up foundation for new integration tests: - integration-tests/go.mod with testscript dependency - harness package for cog binary resolution and custom commands - suite_test.go with TestIntegration that discovers fixtures - string-echo fixture with basic.txtar test case The framework copies fixture files into isolated temp directories, provides a custom 'cog' command for testscript, and supports parallel test execution across fixtures. Run tests with: cd integration-tests && go test -v Closes cog-iws.1 * refactor(integration-tests): use embedded fixtures in txtar files - Embed cog.yaml and predict.py directly in txtar test files - Remove fixtures/ directory and file copying logic - Simplify harness to just configure env vars (HOME, TEST_IMAGE) - Simplify suite_test.go to point testscript at tests/ dir - Group multiple assertions per fixture (build once, test many) - Add unique image names and automatic cleanup per test * feat(integration-tests): add on-demand binary building and more tests - Add automatic cog binary building when COG_BINARY not set - Builds wheels if needed, caches binary in .bin/cog - Finds repo root by looking for go.mod with correct module - Add 4 new test cases: async_predictor, env_vars, file_input, int_predictor - Add .gitignore for cached binary directory * feat(integration-tests): add Makefile target and 9 more test cases - Add test-integration-go Makefile target with -parallel 4 throttling - Add tests: optional_input, path_output, path_list_output, path_list_input, string_list_input, subdirectory_predictor, union_type, many_inputs, train_basic Total: 14 integration tests now passing * Port 13 additional integration tests to Go testscript framework New tests: - path_input_output: Path input/output with setup method - path_input: Path input type reading file content - file_list_input: list[File] input type - complex_output: Pydantic BaseModel output - function_predictor: Function-based predictor (no class) - python313: Python 3.13 support - pydantic2: Explicit Pydantic 2 dependency - optional_path_input: Optional Path with None default - future_annotations: __future__ annotations support - int_none_output: Int return type returning None - string_none_output: Str return type returning None - invalid_int_validation: Schema validation for invalid defaults - no_predictor: Error case for missing predictor Total tests now: 27 (up from 14) * Add Go integration tests to CI with runtime matrix Runs the new testscript-based integration tests with: - cog runtime (blocking) - coglet-alpha runtime (non-blocking, may have error message differences) * Port 22 additional integration tests to Go testscript framework Phase 1 (8 tests): Simple build tests - apt_packages, ffmpeg_package, zsh_package, local_whl_install - install_requires_packaging, bad_dockerignore, pydantic2_output, python37_deprecated Phase 2 (7 tests): Advanced features - secrets, overrides, training_setup, fast_build - migration, migration_no_python_changes, migration_gpu Phase 3 (7 tests): Edge cases - cog_runtime_float, cog_runtime_int, glb_project, granite_project - async_sleep, complex_types, complex_types_list Total ported: 49 tests (up from 27) * Remove Python integration tests that are now ported to Go Removed 39 Python tests and 39 fixture directories that have been fully ported to the Go testscript framework: - Deleted test_migrate.py (3 tests → migration*.txtar) - Deleted test_train.py (3 tests → train_basic, pydantic2_output, training_setup) - Removed 12 tests from test_build.py - Removed 21 tests from test_predict.py Remaining Python tests: 46 (build: 18, config: 1, predict: 20, run: 7) These cover functionality not yet ported: base images, labels, torch/ tensorflow, subprocess handling, JSON I/O, cog run, pipeline tests. * Fix COG_BINARY resolution to use repo root for relative paths * Fix local_whl_install test to include proper WHEEL file in package * Set BUILDKIT_PROGRESS=plain to reduce Docker output noise in tests * Port framework/GPU tests to Go testscript with [slow] skip condition - Add [slow] condition to harness (skip with COG_TEST_FAST=1) - Add 7 new txtar tests for torch/tensorflow builds - Remove ported Python tests and fixtures - Update .gitignore for local planning files * Increase test parallelism for Go integration tests - Add TEST_PARALLEL env var to control concurrency (default 4) - Set TEST_PARALLEL=8 in CI for 16-core runners - Remove continue-on-error for coglet-alpha tests * Add HTTP server testing support and port subprocess tests Extends the Go testscript harness with HTTP server testing capabilities and completes migration of subprocess handling tests from Python. ## Harness Enhancements - Add 'serve' command: starts cog serve in background with automatic port allocation, health checking, and cleanup - Add 'curl' command: makes HTTP requests to running server for testing predictions via HTTP API - Improve condition system: clarify [slow] condition to skip tests when COG_TEST_FAST=1 ## New Tests (4 subprocess tests) - setup_subprocess_simple.txtar: subprocess with SIGUSR1 signals - setup_subprocess_double_fork.txtar: double fork daemonization - setup_subprocess_double_fork_http.txtar: double fork + HTTP server - setup_subprocess_multiprocessing.txtar: Python multiprocessing (currently skipped - needs debugging) ## Python Test Cleanup Removed obsolete Python tests now covered by Go tests: - Deleted test_predict_with_subprocess_in_setup (4 parameterized tests) - Removed 4 subprocess fixture directories - Reduced Python test count from 46 to 37 ## Test Coverage Total Go integration tests: 60 (up from 56) Remaining Python tests: 37 (focus on CLI flags, cog run, JSON I/O) The test suites are now complementary: - Go tests: core predictor functionality, builds, types, server behavior - Python tests: CLI flags (--json, -o), commands (cog run/init/install) * Update CONTRIBUTING.md with new Go integration test workflow - Document Go integration tests as primary test suite (60 tests) - Add instructions for running Go tests with testscript - Explain COG_TEST_FAST for skipping slow tests - Show how to write new integration tests with .txtar format - Add examples for basic predictor and server testing - Update project structure to reflect integration-tests/ directory - Clarify Python tests are supplementary (CLI flags & tooling) * Make cog subcommand syntax consistent across tests Use 'cog serve' instead of 'serve' for consistency with other cog subcommands (build, predict, etc.). This makes the test syntax clearer and sets up for adding more subcommands in the future. Changes: - Refactor cmdCog to use switch statement for subcommand routing - Add comment showing where to add future subcommands (cog run, etc.) - Update all subprocess tests to use 'cog serve' - Update CONTRIBUTING.md examples - Update PR description examples The switch statement makes it easy to add special handling for other subcommands like 'cog run' in the future. * chore: update gitignore Signed-off-by: Mark Phelps <mphelps@cloudflare.com> * Restore async-sleep-project fixture for test_concurrent_predictions The test_concurrent_predictions Python test requires the async-sleep-project fixture to test concurrent async predictions with server shutdown. This test is unique and not covered by the Go async_sleep.txtar test. Recreated fixture from Go test for Python test compatibility. * Skip flaky setup_subprocess_double_fork test in CI This test consistently fails in CI with connection refused errors, suggesting the double forked process doesn't start reliably in the CI environment. The test passes locally but fails in CI for both cog and coglet integration tests. Root cause needs investigation - likely related to timing/environment differences between local and CI, or how the double fork interacts with Docker in CI. Skipping for now to unblock CI while we investigate. * Fix flaky subprocess integration tests with wait-for and retry-curl commands Replace hard-coded sleep delays with proper synchronization mechanisms: - Add wait-for command: Poll for file/http/content conditions with timeout - Add retry-curl command: HTTP requests with automatic retry logic - Update subprocess tests to signal readiness via files - Remove skip markers from previously flaky tests Subprocess test improvements: - setup_subprocess_simple: Wait for .ready file, use retry-curl - setup_subprocess_double_fork: Wait for .forked-ready file, 60s timeout - setup_subprocess_double_fork_http: Wait for HTTP endpoint availability - setup_subprocess_multiprocessing: Wait for .ponger-ready file Benefits: - Eliminates race conditions from fixed sleep delays - CI-friendly with 60s timeouts for slower environments - Self-documenting readiness requirements - Resilient to timing variations between local and CI Updated CONTRIBUTING.md with documentation and examples for new commands. * Suppress Docker build output in integration tests using BUILDKIT_PROGRESS=quiet Fix cog CLI to respect BUILDKIT_PROGRESS environment variable for the --progress flag default. Previously the CLI always defaulted to 'auto', ignoring the env var. Now the env var takes precedence. Changes: - pkg/cli/build.go: Check BUILDKIT_PROGRESS env var before defaulting - integration-tests/harness/harness.go: Set BUILDKIT_PROGRESS=quiet This makes integration test output much cleaner by hiding the verbose Docker build step-by-step progress (#1 [internal] load..., etc.) while still showing build status messages and any errors. * Fix subprocess tests: remove wait-for file (doesn't work with Docker) The wait-for file command checks for files on the host, but subprocess tests create files inside Docker containers started by cog serve. This caused all subprocess tests to timeout waiting for files that would never appear on the host. Fix: Remove wait-for file usage and rely on retry-curl with generous retries (30 attempts, 1s delay) to handle subprocess initialization. The cog server's health check ensures the server is ready, and retries handle any additional subprocess startup time. Changes: - Remove wait-for file from all subprocess tests - Remove file-based readiness signaling from Python scripts - Increase retry-curl attempts to 30 for first prediction - Update CONTRIBUTING.md to remove wait-for examples * Add README for integration tests Comprehensive documentation covering: - Quick start commands - Directory structure - Writing tests (txtar format, embedded fixtures) - Environment variables - Custom commands (cog, curl, retry-curl, wait-for) - Conditions ([slow]) - Built-in testscript commands - Common test patterns - Debugging tips - Common issues and solutions * Add editor support section to integration tests README Document syntax highlighting options for .txtar files: - VS Code: twpayne.vscode-testscript and brody715/vscode-txtar - Zed: FollowTheProcess/zed-txtar - Vim/Neovim: basic suggestions * Fix health check to wait for READY status before returning The waitForServer function was only checking for HTTP 200 status, but the cog server returns 200 even during STARTING state while setup() is still running. This caused race conditions where tests would start making predictions before setup completed. Changes: - Update waitForServer to parse the JSON response and wait for status=READY (meaning setup completed successfully) - Return early if status is SETUP_FAILED or DEFUNCT - Increase HTTP client timeout to 5s for more reliable health checks - Capture server stdout/stderr for better debugging on failures Also fix setup_subprocess_multiprocessing.txtar: - Skip directories when cleaning up *.tmp files (was failing on .tmp dir) - Update assertions to check logs instead of output format (Path returns base64-encoded content, not the file path) * Fix race condition and cleanup in test harness - Add mutex to protect serverProcs map from concurrent access - Key serverProcs by work directory instead of TestScript pointer - Fix cleanup to only stop current test's server, not all servers - Use errors.Is(err, io.EOF) instead of string comparison * Fix flaky double_fork_http test: wait for HTTP server to be ready The test spawns a background HTTP server during setup() but wasn't waiting for it to be ready before returning. This caused predict() to fail with connection refused when trying to connect to the server. Added a retry loop in setup() to wait up to 15 seconds for the background HTTP server to accept connections. --------- Signed-off-by: Mark Phelps <mphelps@cloudflare.com> Co-authored-by: Matt Dwan <mdwan@replicate.com> Co-authored-by: Mark Phelps <mphelps@cloudflare.com> Co-authored-by: Mark Phelps <209477+markphelps@users.noreply.github.com>
Author
Parents
Loading