cog
dfa4fc3c - feat: chunked image push via OCI compatible push (#2760)

Commit
14 days ago
feat: chunked image push via OCI compatible push (#2760) * feat: chunked image push via OCI layout Replace Docker's monolithic ImagePush with a chunked push path for container image layers. Images are exported from the Docker daemon to OCI layout via ImageSave, then pushed through the registry client's existing chunked upload infrastructure (WriteLayer with 256MB chunks). This bypasses the ~500MB Cloudflare Workers request body limit that blocks Docker's native push for large layers. Key changes: - Add OCIImagePusher to pkg/registry/ with concurrent layer uploads - Export images from Docker daemon to OCI layout via ImageSave + tarball - Integrate into Resolver.Push and BundlePusher with Docker push fallback - Add ImageSave method to command.Command interface - Delete unused tools/uploader/ S3 multipart code (363 lines) * refactor: reorganize OCI push into pkg/oci, pkg/registry/config, and pkg/model Move OCI layout utilities to pkg/oci/, extract registry transport config (chunk size, multipart threshold env vars) to pkg/registry/config.go, and relocate OCIImagePusher to pkg/model/ alongside ImagePusher and WeightPusher. - pkg/oci/: pure OCI format utilities (Docker tar <-> OCI layout), no registry deps - pkg/registry/config.go: configurable chunk size and multipart threshold - pkg/model/oci_image_pusher.go: push orchestration with shared pushImageWithFallback() - Deduplicate fallback logic between resolver.go and pusher.go - Add error discrimination: no fallback on auth errors or context cancellation - Create OCIImagePusher once in NewResolver, not per-push call * chore: minor cleanup * refactor: consolidate push progress and concurrency across image and weight pushers - Unify ImagePushProgress and WeightPushProgress into shared PushProgress type - Extract writeLayerWithProgress() helper to deduplicate progress channel boilerplate between OCIImagePusher and WeightPusher - Unify push concurrency: both image layer pushes and weight pushes use GetPushConcurrency() (default 4, overridable via COG_PUSH_CONCURRENCY) - Fix BundlePusher.pushWeights() which had no concurrency limit (launched all goroutines at once); now uses errgroup.SetLimit - Implement auth error detection in shouldFallbackToDocker() to match its documented behavior (don't fall back on UNAUTHORIZED/DENIED errors) * refactor: remove auth string matching from shouldFallbackToDocker String-based error detection is fragile. Fall back to Docker push on any error except context cancellation/timeout. * refactor: replace hand-rolled progress tracker with mpb library Replace ~200 lines of custom ANSI escape progress rendering with the mpb (multi-progress-bar) library, which was already a dependency but unused. mpb handles TTY detection, cursor management, concurrent bar updates, and size formatting natively. Retry status is shown via a dynamic decorator. * fix: add WithAutoRefresh to mpb progress container to prevent deadlock in non-TTY * fix: create mpb bars with total=0 to avoid triggerComplete blocking SetTotal completion When bars are created with total > 0, mpb sets triggerComplete=true internally. This causes SetTotal(n, true) to early-return without triggering completion, so bars never finish and p.Wait() deadlocks. Creating bars with total=0 leaves triggerComplete=false, allowing explicit completion via SetTotal(current, true) after push finishes. The real total is still set dynamically via ProgressFn callbacks. * refactor: consolidate OCIImagePusher into ImagePusher Merge OCIImagePusher (OCI chunked push) and the old ImagePusher (Docker push) into a single ImagePusher type that tries OCI first and falls back to Docker push on non-fatal errors. - ImagePusher.Push() handles OCI→Docker fallback internally - Delete OCIImagePusher type and oci_image_pusher.go - BundlePusher takes *ImagePusher directly instead of separate oci/docker pushers - Resolver stores single imagePusher field instead of ociPusher - Remove dead Pusher interface - Consolidate tests into image_pusher_test.go * refactor: remove ImageSaveFunc abstraction and pkg/oci package ImagePusher now calls p.docker.ImageSave() directly instead of going through the oci.ImageSaveFunc indirection. The OCI layout export logic is inlined into ImagePusher.ociPush(). The pkg/oci package is deleted entirely since it had no other consumers. * feat: gate OCI chunked push behind COG_PUSH_OCI=1 env var OCI push is now opt-in rather than always-on when a registry client is present. Requires COG_PUSH_OCI=1 to activate. * chore: mod tidy Signed-off-by: Mark Phelps <mphelps@cloudflare.com> * feat: add progress bars for OCI image push and force HTTP/1.1 transport - Add dynamic mpb progress bars for per-layer upload progress during OCI push - Wire ImageProgressFn through PushOptions → Resolver → ImagePusher - Force HTTP/1.1 for registry chunked uploads to avoid HTTP/2 RST_STREAM errors - Add HTTP/2 stream errors to isRetryableError for retry resilience - Reduce default chunk size from 256MB to 95MB to stay under CDN body limits * feat: respect OCI-Chunk-Min/Max-Length headers from registry Parse OCI-Chunk-Min-Length and OCI-Chunk-Max-Length headers from the registry's upload initiation response (POST /v2/.../blobs/uploads/). The server-advertised maximum always takes precedence over client defaults, and the result is clamped to be at least the server minimum. Rename COG_PUSH_CHUNK_SIZE to COG_PUSH_DEFAULT_CHUNK_SIZE to clarify that it is only a fallback for registries that don't advertise limits. * fix: replace mpb with Docker jsonmessage for push progress display Replace the mpb progress bar library with Docker's jsonmessage rendering (the same code used by `docker push`) for OCI layer and weight upload progress. This fixes terminal corruption when the terminal is resized during a push. Root cause: mpb writes all bars then uses a bulk cursor-up (CUU N) to reposition. When the terminal shrinks, previously rendered lines wrap to occupy more visual lines, but the cursor-up count stays at the logical count, leaving ghost copies of progress bars on screen. Docker's jsonmessage avoids this by erasing and rewriting each line individually (ESC[2K + per-line cursor up/down), and re-querying terminal width on every render via ioctl(TIOCGWINSZ). Also removes the mpb dependency entirely from go.mod. * fix: sanitize OCI push error output and clear progress on Docker fallback Strip HTML response bodies from transport errors (e.g., Cloudflare 413 pages) before displaying to user. Add OnFallback callback to close the progress writer before Docker push starts, preventing stale OCI progress bars from lingering above Docker's output. * fix: address code review findings for OCI chunked push - Remove unused OCI layout directory creation that doubled disk I/O (C1) - Randomize retry jitter to avoid thundering herd (W2) - Skip Docker fallback on 401/403 auth errors since they'd fail identically (W3) - Pool chunk buffers via sync.Pool to reduce memory pressure (W4) - Suppress duplicate retry log messages in TTY mode (S5) * chore: bump default push concurrency from 4 to 5 to match Docker's default * fix: address PR review feedback for OCI chunked push - Fix multipart threshold/chunksize inversion: raise DefaultMultipartThreshold from 50MB to 128MB so blobs that fit in a single chunk avoid unnecessary multipart overhead - Use HTTP/1.1 for all registry operations, not just chunked uploads, to avoid HTTP/2 head-of-line blocking and stream errors on large uploads - Move ProgressWriter from pkg/cli to pkg/docker since it wraps Docker's jsonmessage rendering and belongs with Docker concerns - Replace manual semaphore+WaitGroup in weights push with errgroup for bounded concurrency and first-error cancellation - Refactor ImagePusher.Push to accept *ImageArtifact instead of a raw string, preserving the resolved reference from the Model type - Unify BundlePusher construction: NewBundlePusher now takes docker+registry clients and creates both sub-pushers internally - Clarify that tarball.ImageFromPath is lazy (reads layers on-demand from the temp file, not into memory) * chore: use 96mb for default chunk size, its cleaner Signed-off-by: Mark Phelps <mphelps@cloudflare.com> * refactor: use functional options pattern for ImagePusher.Push Replace the variadic struct slice ([]ImagePushOptions) with idiomatic functional options (WithProgressFn, WithOnFallback). The resolved config struct is now unexported, and callers compose options cleanly: pusher.Push(ctx, artifact, WithProgressFn(fn), WithOnFallback(fn)) * chore: remove dead HTTP/2 stream error handling and fix stale size comments Remove isHTTP2StreamError() and its test cases — HTTP/2 is no longer possible since we force HTTP/1.1 on all registry operations. Fix stale comments referencing 95 MB chunk size (now 96 MB), 50 MB multipart threshold (now 128 MiB), and concurrency 4 (now 5). * chore: clean up stale comments and clarify buffer pool reuse - Remove outdated 'Requires registry client' comment from canOCIPush (nil registry is no longer a concern) - Simplify sanitizeError doc comment - Add context to DefaultPushConcurrency matching Docker's default - Document why pooled chunk buffers don't need zeroing * chore: minor cleanup Signed-off-by: Mark Phelps <mphelps@cloudflare.com> --------- Signed-off-by: Mark Phelps <mphelps@cloudflare.com>
Author
Parents
Loading