feat: Replace Pydantic with native Python dataclasses for cog.BaseModel (#2681)
* chore: remove pydantic-based cog implementation
Remove the legacy pydantic-based Python SDK to prepare for the
dataclass-based implementation. This includes all server code,
type definitions, and associated tests.
* feat: add dataclass-based cog implementation
Replace pydantic with a pure dataclass-based implementation:
- Type inspection without pydantic overhead
- Schema generation using native Python types
- Custom coder system for complex type serialization
- API compatible with existing predictors
* refactor: simplify to single cog wheel
Remove multi-wheel complexity now that pydantic-based cog is replaced:
- pkg/wheels: embed only the cog wheel, remove cog-dataclass
- pkg/dockerfile: simplify wheel installation to single embedded wheel
- integration-tests: remove cog_dataclass condition
- CI: remove dataclass-specific test matrix entries
- tox: remove pydantic version matrix
- mise: consolidate coglet-python test task
* test: remove pydantic-specific integration tests
Delete tests that specifically test pydantic 1.x/2.x behavior which is
no longer relevant with the dataclass-based implementation.
* test: unskip complex_output test
The dataclass implementation handles Pydantic BaseModel outputs via
duck-typing - it checks for model_dump() (v2) or dict() (v1) methods
in cog/json.py:make_encodeable(). Users can still use Pydantic for
their own model types.
* test: unskip setup_subprocess_multiprocessing test
Remove obsolete skips - the test uses Python 3.10 which is supported.
Verified passing with both Python and Rust coglet servers.
* test: unskip torch_baseimage tests
Remove obsolete skips - the tests use Python 3.10 which is supported.
These are slow tests that will run in CI (not -short mode).
* test: unskip build_cog_version_match test
Remove obsolete skips. This test verifies cog version in base images.
Verified passing with both Python and Rust coglet servers.
* test: remove coglet_alpha skips from integration tests
coglet_alpha is no longer a supported configuration - remove all skips.
* refactor(coglet): remove pydantic-specific code paths
- Simplify format_validation_error to use cog's already-formatted errors
- Remove unwrap_pydantic_serialization_iterators (no longer needed)
- Remove schema_via_fastapi fallback, use cog._schemas directly
- Update Runtime enum: remove Pydantic variant, rename NonPydantic to Cog
- Update SdkImplementation: remove Pydantic/Dataclass, use Cog/Unknown
- Update detection to check for cog._adt module
- Update comments to remove pydantic references
* test: update complex_output to use cog.BaseModel instead of pydantic.BaseModel
pydantic.BaseModel outputs are no longer supported. Users should use
cog.BaseModel (a dataclass) or @dataclass for structured outputs.
* feat: implement user-defined healthcheck support for Python server
Add support for user-defined healthcheck() method on predictors:
- Add Healthcheck event type to eventtypes.py
- Add get_healthcheck() helper to predictor.py
- Add healthcheck() method to Worker and _ChildWorker classes
- Add healthcheck() to PredictionRunner
- Update /health-check endpoint to call user healthcheck
- Add UNHEALTHY status to Health enum
Features:
- Sync and async healthcheck methods supported
- 5 second timeout for healthcheck execution
- Returns UNHEALTHY with error details on failure/timeout/exception
Remove [cog_dataclass] skip from healthcheck integration tests.
* feat(coglet): implement user-defined healthcheck support
Add healthcheck support to coglet-rust:
Protocol:
- Add ControlRequest::Healthcheck and ControlResponse::HealthcheckResult
- Add HealthcheckStatus enum (Healthy/Unhealthy)
Orchestrator:
- Add HealthcheckResult type with healthy()/unhealthy() constructors
- Add healthcheck() method to Orchestrator trait
- Implement request/response flow via control channel
- Add semaphore to prevent concurrent healthchecks (skip if busy)
- Handle healthcheck results in event loop
HTTP:
- Add HealthResponse enum (includes transient UNHEALTHY state)
- Update /health-check to call user healthcheck when ready
- Return user_healthcheck_error in response on failure
Worker:
- Add healthcheck() to PredictHandler trait (default: healthy)
- Handle Healthcheck requests in worker event loop
Python integration (coglet-python):
- Add has_healthcheck() and is_healthcheck_async() to PythonPredictor
- Implement healthcheck_sync() with ThreadPoolExecutor + 5s timeout
- Implement healthcheck_async() with asyncio.wait_for + 5s timeout
- Wire up in PythonPredictHandler::healthcheck()
* test: add async healthcheck integration tests and enable coglet_rust
- Remove [coglet_rust] skip from existing sync healthcheck tests
- Add async healthcheck tests:
- healthcheck_async_custom: async healthcheck returning True
- healthcheck_async_unhealthy: async healthcheck returning False
- healthcheck_async_exception: async healthcheck raising exception
- healthcheck_async_timeout: async healthcheck timing out (>5s)
* fix: resolve pyright type errors and lint issues
Python type fixes:
- _adt.py: Fix type hints for PrimitiveType methods to handle Any
- config.py: Add type arguments to dict types
- input.py: Add cast for default_factory, add type ignore for field()
- coder.py: Rename factory parameter from cls to tpe (static method)
- coders/*.py: Match renamed parameter in factory method overrides
- http.py: Add type ignores for dynamic FastAPI types and coglet module
- _inspector.py: Remove unused imports, add 'from None' to re-raises
Makefile:
- Update tox env from typecheck-pydantic2 to typecheck (pydantic removed)
Cleanup:
- Remove unused warnings import from _inspector.py
- Remove experimental coders warning
* fix(coglet): correct healthcheck timeout message format and harness
- Change timeout format from {} to {:.1} to output '5.0' instead of '5'
- Update test harness waitForServer to accept UNHEALTHY and BUSY as valid 'ready' states
* docs: remove all Pydantic references
- Remove Pydantic compat code from cog.Path
- Update README, docs/python.md, docs/llms.txt
- Clean up comments referencing pydantic
* chore: remove pydantic dependency and cog-dataclass scaffold
- Remove pydantic from dependencies in pyproject.toml
- Simplify dependencies to minimal set
- Remove PYDANTIC_V2 constant from pyright config
- Delete cog-dataclass/ directory (was scaffold, code now in python/cog/)
* fix: CI failures - lint, Go tests, and CodeQL warning
- Remove unused Type import from types.py
- Remove pydantic from Go dockerfile test expectation
- Remove pydantic comment from requirements_test.go
- Fix pyright warnings in openapi_schema.py (use Any type)
- Sanitize validation error messages to first line only
* test: fix healthcheck timeout tests to use trigger-based approach
Use prediction to trigger slow healthcheck mode instead of relying on
call counting, which was flaky due to harness also calling healthcheck.
* fix: remove unused imports in openapi_schema.py
* fix: properly timeout sync healthchecks in Python server
Use ThreadPoolExecutor with shutdown(wait=False) to avoid blocking
when sync healthcheck exceeds timeout. Previously the context manager
would wait for the thread to complete even after timeout.
* fix: sanitize validation error messages to prevent info leakage
Add _sanitize_validation_message() that only passes through known safe
validation patterns (Field required, Invalid value, fails constraint,
does not match regex/choices). Unknown messages are replaced with
generic 'Invalid value' to prevent potential stack trace or internal
details from reaching clients.
This addresses CodeQL security warning about information exposure.