Description
tl;dr: If layer X is pulled before layer Y and they share chunks or even the full blob, pulling Y should skip fetching from the network whatever is already in local storage. But this doesn't seem to happen in partial pulls mode if we use ConcurrentBlobCopiesSemaphore
to sequence the layer pulls. It happens with gzip. It also happens if Y is from another image and we start the copy.Image
after X finished pulling. It seems that the semaphore doesn't establish a clear happens-before relationship between the layer pulls, perhaps some metadata updates are being done concurrently still.
I have the following golang script to test performance of image pulling with various concurrency settings and in different compression formats:
package main
import (
"context"
"flag"
"fmt"
"os"
"strings"
"sync"
"time"
"github.com/containers/image/v5/copy"
"github.com/containers/image/v5/docker"
"github.com/containers/image/v5/signature"
"github.com/containers/image/v5/transports/alltransports"
"github.com/containers/image/v5/types"
"github.com/containers/storage"
"github.com/containers/storage/pkg/reexec"
"github.com/containers/storage/pkg/unshare"
"github.com/sirupsen/logrus"
"golang.org/x/sync/semaphore"
)
func main() {
if reexec.Init() {
return
}
unshare.MaybeReexecUsingUserNamespace(false)
registryURL := flag.String("registry", "", "URL of the Docker registry")
imageList := flag.String("images", "", "Comma-separated list of tagged image names")
concurrency := flag.Int("concurrency", 3, "Concurrency factor for downloads")
enablePartialImages := flag.Bool("enable-partial-images", false, "Whether to enable partial images")
debugLogging := flag.Bool("debug", false, "Enable logrus debug logging")
flag.Parse()
if *debugLogging {
logrus.SetLevel(logrus.DebugLevel)
}
logrus.SetFormatter(&logrus.TextFormatter{
FullTimestamp: true,
})
if *registryURL == "" || *imageList == "" {
fmt.Println("Registry URL and image list are required")
flag.Usage()
os.Exit(1)
}
images := strings.Split(*imageList, ",")
for i, img := range images {
images[i] = strings.TrimSpace(img)
}
storeOptions, err := storage.DefaultStoreOptions()
if err != nil {
logrus.Fatalf("Failed to get default store options: %v", err)
}
storeOptions.GraphDriverName = "overlay"
storeOptions.PullOptions = map[string]string{
"enable_partial_images": fmt.Sprintf("%t", *enablePartialImages),
}
store, err := storage.GetStore(storeOptions)
if err != nil {
logrus.Fatalf("Failed to initialize storage: %v", err)
}
defer func() {
if layers, err := store.Shutdown(false); err != nil {
logrus.Errorf("Error shutting down storage: %v, still mounted layers: %v", err, layers)
}
}()
policy := &signature.Policy{
Default: []signature.PolicyRequirement{
signature.NewPRInsecureAcceptAnything(),
},
}
policyContext, err := signature.NewPolicyContext(policy)
if err != nil {
logrus.Fatalf("Failed to create policy context: %v", err)
}
defer func() {
if err := policyContext.Destroy(); err != nil {
logrus.Errorf("Error destroying policy context: %v", err)
}
}()
sem := semaphore.NewWeighted(int64(*concurrency))
ctx := context.Background()
sysCtx := &types.SystemContext{
DockerInsecureSkipTLSVerify: types.NewOptionalBool(true),
OCIInsecureSkipTLSVerify: true,
DockerDaemonInsecureSkipTLSVerify: true,
}
startTime := time.Now()
var wg sync.WaitGroup
for _, imageName := range images {
wg.Add(1)
go func(img string) {
defer wg.Done()
srcRef, err := docker.ParseReference(fmt.Sprintf("//%s/%s", *registryURL, img))
if err != nil {
logrus.Errorf("Failed to parse reference for %s: %v", img, err)
return
}
destRef, err := alltransports.ParseImageName(fmt.Sprintf("containers-storage:%s", img))
if err != nil {
logrus.Errorf("Failed to parse destination reference for %s: %v", img, err)
return
}
options := ©.Options{
ReportWriter: os.Stdout,
DestinationCtx: sysCtx,
SourceCtx: sysCtx,
ConcurrentBlobCopiesSemaphore: sem,
}
logrus.Infof("Pulling %s from %s", img, *registryURL)
_, err = copy.Image(ctx, policyContext, destRef, srcRef, options)
if err != nil {
logrus.Errorf("Failed to pull %s: %v", img, err)
return
}
logrus.Infof("Successfully pulled %s", img)
}(imageName)
}
wg.Wait()
duration := time.Since(startTime)
fmt.Printf("\nTotal pull operation completed in %s\n", duration)
}
I create a semaphore to limit concurrency which I pass as ConcurrentBlobCopiesSemaphore
.
I have a local docker registry which stores ubuntu:24.10 image in zstd:chunked format and in gzip format:
$ skopeo inspect --tls-verify=false docker://localhost:5000/ubuntu1:zstd --raw | jq
{
"schemaVersion": 2,
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"config": {
"mediaType": "application/vnd.oci.image.config.v1+json",
"digest": "sha256:e40b6e31bd8cba664c267ee7a21689b1f98585818942761ff1cb8cd3c2df323d",
"size": 2296
},
"layers": [
{
"mediaType": "application/vnd.oci.image.layer.v1.tar+zstd",
"digest": "sha256:0cc74fd753a0d2d2e8ebc140549a660831a99429f5b44275cc4dda88a90c7257",
"size": 33180938,
"annotations": {
"io.github.containers.zstd-chunked.manifest-checksum": "sha256:a906ece44060de0be1de012cd50bcaceec3ac6bb680509c7ef8a1106a01365a4",
"io.github.containers.zstd-chunked.manifest-position": "32649294:380035:1797807:1",
"io.github.containers.zstd-chunked.tarsplit-position": "33029337:151529:3784936"
}
}
]
}
$ skopeo inspect --tls-verify=false docker://localhost:5000/ubuntu1:gzip --raw | jq
{
"schemaVersion": 2,
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"config": {
"mediaType": "application/vnd.oci.image.config.v1+json",
"digest": "sha256:e40b6e31bd8cba664c267ee7a21689b1f98585818942761ff1cb8cd3c2df323d",
"size": 2296
},
"layers": [
{
"mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
"digest": "sha256:d14e9fd344e3d4bb8ae535902ba39e40cca2ec9ab404e4cc1bf911e088b5361c",
"size": 32585328
}
]
}
ubuntu1:zstd
is also tagged as ubuntu2:zstd
in the registry, and ubuntu1:gzip
is tagged as ubuntu2:gzip
.
Baseline is pulling a single image:
$ go run ./ -registry localhost:5000 -images ubuntu1:gzip -enable-partial-images -concurrency 1
INFO[2025-03-20T12:04:20+01:00] Pulling ubuntu1:gzip from localhost:5000
Getting image source signatures
Copying blob d14e9fd344e3 done |
Copying config e40b6e31bd done |
Writing manifest to image destination
INFO[2025-03-20T12:04:21+01:00] Successfully pulled ubuntu1:gzip
Total pull operation completed in 774.178588ms
I also measure how many bytes are transferred from the docker registry. Here 0.030 GB
was transferred, which is the size of this single layer image. Makes sense.
Next experiment is to pull both ubuntu1:gzip
and ubuntu2:gzip
. I expect only one of them to actually pull the data over network, and the other to reuse the blob, since we're limiting imageCopier.copyLayer
concurrency to 1.
$ go run ./ -registry localhost:5000 -images ubuntu1:gzip,ubuntu2:gzip -enable-partial-images -concurrency 1
INFO[2025-03-20T12:06:44+01:00] Pulling ubuntu2:gzip from localhost:5000
INFO[2025-03-20T12:06:44+01:00] Pulling ubuntu1:gzip from localhost:5000
Getting image source signatures
Getting image source signatures
Copying blob d14e9fd344e3 done |
Copying blob d14e9fd344e3 skipped: already exists
Copying config e40b6e31bd done |
Writing manifest to image destination
Copying config e40b6e31bd done |
Writing manifest to image destination
INFO[2025-03-20T12:06:45+01:00] Successfully pulled ubuntu1:gzip
INFO[2025-03-20T12:06:45+01:00] Successfully pulled ubuntu2:gzip
Total pull operation completed in 798.510689ms
Indeed, we see "already exists" in the logs, and my metrics show that 0.030 GB
was transferred. Great.
Now let's do the same with zstd
.
Baseline:
$ go run ./ -registry localhost:5000 -images ubuntu1:zstd -enable-partial-images -concurrency 1
INFO[2025-03-20T12:07:35+01:00] Pulling ubuntu1:zstd from localhost:5000
Getting image source signatures
Copying blob 0cc74fd753a0 done 31.6MiB / 31.6MiB (skipped: 540.0b = 0.00%)
Copying config e40b6e31bd done |
Writing manifest to image destination
INFO[2025-03-20T12:07:36+01:00] Successfully pulled ubuntu1:zstd
Total pull operation completed in 844.650036ms
0.031 GB transferred.
A bit worse than gzip but that's ok, maybe this small layer just didn't compress with zstd:chunked that well.
But now the bad part:
$ go run ./ -registry localhost:5000 -images ubuntu1:zstd,ubuntu2:zstd -enable-partial-images -concurrency 1
INFO[2025-03-20T12:08:37+01:00] Pulling ubuntu2:zstd from localhost:5000
INFO[2025-03-20T12:08:37+01:00] Pulling ubuntu1:zstd from localhost:5000
Getting image source signatures
Getting image source signatures
Copying blob 0cc74fd753a0 done 31.6MiB / 31.6MiB (skipped: 540.0b = 0.00%)
Copying blob 0cc74fd753a0 done 31.6MiB / 31.6MiB (skipped: 540.0b = 0.00%)
Copying config e40b6e31bd done |
Writing manifest to image destination
Copying config e40b6e31bd done |
Writing manifest to image destination
INFO[2025-03-20T12:08:38+01:00] Successfully pulled ubuntu2:zstd
INFO[2025-03-20T12:08:38+01:00] Successfully pulled ubuntu1:zstd
Total pull operation completed in 1.632340319s
as the logs show, the blob was pulled twice! And my metrics also show 0.062 GB
was transferred over the interface.
I also did this experiment on large (~2GB compressed) image with ~10 layers. Every single layer was pulled twice when using zstd version, even with concurrency 1.
It's not that zstd:chunked is not capable of reusing layers or even chunks. It is, which we can see if we serialize the operations on user level i.e. pull ubuntu1:zstd and then ubuntu2:zstd without clearing local store in between:
$ go run ./ -registry localhost:5000 -images ubuntu1:zstd -enable-partial-images -concurrency 1
INFO[2025-03-20T12:10:51+01:00] Pulling ubuntu1:zstd from localhost:5000
Getting image source signatures
Copying blob 0cc74fd753a0 done 31.6MiB / 31.6MiB (skipped: 540.0b = 0.00%)
Copying config e40b6e31bd done |
Writing manifest to image destination
INFO[2025-03-20T12:10:52+01:00] Successfully pulled ubuntu1:zstd
Total pull operation completed in 816.480472ms
$ go run ./ -registry localhost:5000 -images ubuntu2:zstd -enable-partial-images -concurrency 1
INFO[2025-03-20T12:10:54+01:00] Pulling ubuntu2:zstd from localhost:5000
Getting image source signatures
Copying blob 0cc74fd753a0 skipped: already exists
Copying config e40b6e31bd done |
Writing manifest to image destination
INFO[2025-03-20T12:10:54+01:00] Successfully pulled ubuntu2:zstd
Total pull operation completed in 84.912156ms
But perhaps when we rely on ConcurrentBlobCopiesSemaphore
for serialization, something wrong is going underneath with the metadata updates.