feat: chunked image push via OCI compatible push (#2760)
* feat: chunked image push via OCI layout
Replace Docker's monolithic ImagePush with a chunked push path for
container image layers. Images are exported from the Docker daemon to
OCI layout via ImageSave, then pushed through the registry client's
existing chunked upload infrastructure (WriteLayer with 256MB chunks).
This bypasses the ~500MB Cloudflare Workers request body limit that
blocks Docker's native push for large layers.
Key changes:
- Add OCIImagePusher to pkg/registry/ with concurrent layer uploads
- Export images from Docker daemon to OCI layout via ImageSave + tarball
- Integrate into Resolver.Push and BundlePusher with Docker push fallback
- Add ImageSave method to command.Command interface
- Delete unused tools/uploader/ S3 multipart code (363 lines)
* refactor: reorganize OCI push into pkg/oci, pkg/registry/config, and pkg/model
Move OCI layout utilities to pkg/oci/, extract registry transport config
(chunk size, multipart threshold env vars) to pkg/registry/config.go, and
relocate OCIImagePusher to pkg/model/ alongside ImagePusher and WeightPusher.
- pkg/oci/: pure OCI format utilities (Docker tar <-> OCI layout), no registry deps
- pkg/registry/config.go: configurable chunk size and multipart threshold
- pkg/model/oci_image_pusher.go: push orchestration with shared pushImageWithFallback()
- Deduplicate fallback logic between resolver.go and pusher.go
- Add error discrimination: no fallback on auth errors or context cancellation
- Create OCIImagePusher once in NewResolver, not per-push call
* chore: minor cleanup
* refactor: consolidate push progress and concurrency across image and weight pushers
- Unify ImagePushProgress and WeightPushProgress into shared PushProgress type
- Extract writeLayerWithProgress() helper to deduplicate progress channel
boilerplate between OCIImagePusher and WeightPusher
- Unify push concurrency: both image layer pushes and weight pushes use
GetPushConcurrency() (default 4, overridable via COG_PUSH_CONCURRENCY)
- Fix BundlePusher.pushWeights() which had no concurrency limit (launched
all goroutines at once); now uses errgroup.SetLimit
- Implement auth error detection in shouldFallbackToDocker() to match its
documented behavior (don't fall back on UNAUTHORIZED/DENIED errors)
* refactor: remove auth string matching from shouldFallbackToDocker
String-based error detection is fragile. Fall back to Docker push on any
error except context cancellation/timeout.
* refactor: replace hand-rolled progress tracker with mpb library
Replace ~200 lines of custom ANSI escape progress rendering with the mpb
(multi-progress-bar) library, which was already a dependency but unused.
mpb handles TTY detection, cursor management, concurrent bar updates, and
size formatting natively. Retry status is shown via a dynamic decorator.
* fix: add WithAutoRefresh to mpb progress container to prevent deadlock in non-TTY
* fix: create mpb bars with total=0 to avoid triggerComplete blocking SetTotal completion
When bars are created with total > 0, mpb sets triggerComplete=true
internally. This causes SetTotal(n, true) to early-return without
triggering completion, so bars never finish and p.Wait() deadlocks.
Creating bars with total=0 leaves triggerComplete=false, allowing
explicit completion via SetTotal(current, true) after push finishes.
The real total is still set dynamically via ProgressFn callbacks.
* refactor: consolidate OCIImagePusher into ImagePusher
Merge OCIImagePusher (OCI chunked push) and the old ImagePusher (Docker
push) into a single ImagePusher type that tries OCI first and falls back
to Docker push on non-fatal errors.
- ImagePusher.Push() handles OCI→Docker fallback internally
- Delete OCIImagePusher type and oci_image_pusher.go
- BundlePusher takes *ImagePusher directly instead of separate oci/docker pushers
- Resolver stores single imagePusher field instead of ociPusher
- Remove dead Pusher interface
- Consolidate tests into image_pusher_test.go
* refactor: remove ImageSaveFunc abstraction and pkg/oci package
ImagePusher now calls p.docker.ImageSave() directly instead of going
through the oci.ImageSaveFunc indirection. The OCI layout export logic
is inlined into ImagePusher.ociPush(). The pkg/oci package is deleted
entirely since it had no other consumers.
* feat: gate OCI chunked push behind COG_PUSH_OCI=1 env var
OCI push is now opt-in rather than always-on when a registry client
is present. Requires COG_PUSH_OCI=1 to activate.
* chore: mod tidy
Signed-off-by: Mark Phelps <mphelps@cloudflare.com>
* feat: add progress bars for OCI image push and force HTTP/1.1 transport
- Add dynamic mpb progress bars for per-layer upload progress during OCI push
- Wire ImageProgressFn through PushOptions → Resolver → ImagePusher
- Force HTTP/1.1 for registry chunked uploads to avoid HTTP/2 RST_STREAM errors
- Add HTTP/2 stream errors to isRetryableError for retry resilience
- Reduce default chunk size from 256MB to 95MB to stay under CDN body limits
* feat: respect OCI-Chunk-Min/Max-Length headers from registry
Parse OCI-Chunk-Min-Length and OCI-Chunk-Max-Length headers from the
registry's upload initiation response (POST /v2/.../blobs/uploads/).
The server-advertised maximum always takes precedence over client
defaults, and the result is clamped to be at least the server minimum.
Rename COG_PUSH_CHUNK_SIZE to COG_PUSH_DEFAULT_CHUNK_SIZE to clarify
that it is only a fallback for registries that don't advertise limits.
* fix: replace mpb with Docker jsonmessage for push progress display
Replace the mpb progress bar library with Docker's jsonmessage rendering
(the same code used by `docker push`) for OCI layer and weight upload
progress. This fixes terminal corruption when the terminal is resized
during a push.
Root cause: mpb writes all bars then uses a bulk cursor-up (CUU N) to
reposition. When the terminal shrinks, previously rendered lines wrap to
occupy more visual lines, but the cursor-up count stays at the logical
count, leaving ghost copies of progress bars on screen.
Docker's jsonmessage avoids this by erasing and rewriting each line
individually (ESC[2K + per-line cursor up/down), and re-querying
terminal width on every render via ioctl(TIOCGWINSZ).
Also removes the mpb dependency entirely from go.mod.
* fix: sanitize OCI push error output and clear progress on Docker fallback
Strip HTML response bodies from transport errors (e.g., Cloudflare 413
pages) before displaying to user. Add OnFallback callback to close the
progress writer before Docker push starts, preventing stale OCI progress
bars from lingering above Docker's output.
* fix: address code review findings for OCI chunked push
- Remove unused OCI layout directory creation that doubled disk I/O (C1)
- Randomize retry jitter to avoid thundering herd (W2)
- Skip Docker fallback on 401/403 auth errors since they'd fail identically (W3)
- Pool chunk buffers via sync.Pool to reduce memory pressure (W4)
- Suppress duplicate retry log messages in TTY mode (S5)
* chore: bump default push concurrency from 4 to 5 to match Docker's default
* fix: address PR review feedback for OCI chunked push
- Fix multipart threshold/chunksize inversion: raise DefaultMultipartThreshold
from 50MB to 128MB so blobs that fit in a single chunk avoid unnecessary
multipart overhead
- Use HTTP/1.1 for all registry operations, not just chunked uploads, to avoid
HTTP/2 head-of-line blocking and stream errors on large uploads
- Move ProgressWriter from pkg/cli to pkg/docker since it wraps Docker's
jsonmessage rendering and belongs with Docker concerns
- Replace manual semaphore+WaitGroup in weights push with errgroup for
bounded concurrency and first-error cancellation
- Refactor ImagePusher.Push to accept *ImageArtifact instead of a raw string,
preserving the resolved reference from the Model type
- Unify BundlePusher construction: NewBundlePusher now takes docker+registry
clients and creates both sub-pushers internally
- Clarify that tarball.ImageFromPath is lazy (reads layers on-demand from the
temp file, not into memory)
* chore: use 96mb for default chunk size, its cleaner
Signed-off-by: Mark Phelps <mphelps@cloudflare.com>
* refactor: use functional options pattern for ImagePusher.Push
Replace the variadic struct slice ([]ImagePushOptions) with idiomatic
functional options (WithProgressFn, WithOnFallback). The resolved config
struct is now unexported, and callers compose options cleanly:
pusher.Push(ctx, artifact, WithProgressFn(fn), WithOnFallback(fn))
* chore: remove dead HTTP/2 stream error handling and fix stale size comments
Remove isHTTP2StreamError() and its test cases — HTTP/2 is no longer
possible since we force HTTP/1.1 on all registry operations. Fix stale
comments referencing 95 MB chunk size (now 96 MB), 50 MB multipart
threshold (now 128 MiB), and concurrency 4 (now 5).
* chore: clean up stale comments and clarify buffer pool reuse
- Remove outdated 'Requires registry client' comment from canOCIPush
(nil registry is no longer a concern)
- Simplify sanitizeError doc comment
- Add context to DefaultPushConcurrency matching Docker's default
- Document why pooled chunk buffers don't need zeroing
* chore: minor cleanup
Signed-off-by: Mark Phelps <mphelps@cloudflare.com>
---------
Signed-off-by: Mark Phelps <mphelps@cloudflare.com>