You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -482,8 +482,8 @@ While the `imagePullPolicy` is working on container level, the introduced
482
482
values `IfNotPresent`, `Always` and `Never`, but will only pull once per pod.
483
483
484
484
Technically it means that we need to pull in [`SyncPod`](https://github.com/kubernetes/kubernetes/blob/b498eb9/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L1049)
485
-
for OCI objects on a pod level and not during [`EnsureImageExists`](https://github.com/kubernetes/kubernetes/blob/b498eb9/pkg/kubelet/images/image_manager.go#L102)
486
-
before the container gets started.
485
+
for OCI objects on a pod level and not for each container during [`EnsureImageExists`](https://github.com/kubernetes/kubernetes/blob/b498eb9/pkg/kubelet/images/image_manager.go#L102)
486
+
before they get started.
487
487
488
488
If users want to re-pull artifacts when referencing moving tags like `latest`,
489
489
then they need to restart / evict the pod.
@@ -500,50 +500,44 @@ container image.
500
500
#### CRI
501
501
502
502
The CRI API is already capable of managing container images [via the `ImageService`](https://github.com/kubernetes/cri-api/blob/3a66d9d/pkg/apis/runtime/v1/api.proto#L146-L161).
503
-
Those RPCs will be re-used for managing OCI artifacts, while the [`ImageSpec`](https://github.com/kubernetes/cri-api/blob/3a66d9d/pkg/apis/runtime/v1/api.proto#L798-L813)
504
-
as well as [`PullImageResponse`](https://github.com/kubernetes/cri-api/blob/3a66d9d/pkg/apis/runtime/v1/api.proto#L1530-L1534)
505
-
will be extended to mount the OCI object to a local path:
503
+
Those RPCs will be re-used for managing OCI artifacts, while the [`Mount`](https://github.com/kubernetes/cri-api/blob/3a66d9d/pkg/apis/runtime/v1/api.proto#L220-L247)
504
+
message will be extended to mount an OCI object using the existing [`ImageSpec`](https://github.com/kubernetes/cri-api/blob/3a66d9d/pkg/apis/runtime/v1/api.proto#L798-L813)
505
+
on container creation:
506
506
507
507
```protobuf
508
-
509
-
// ImageSpec is an internal representation of an image.
510
-
message ImageSpec {
511
-
// …
512
-
513
-
// Indicate that the OCI object should be mounted.
514
-
bool mount = 20;
515
-
516
-
// SELinux label to be used.
517
-
string mount_label = 21;
518
-
}
519
-
520
-
message PullImageResponse {
508
+
// Mount specifies a host volume to mount into a container.
509
+
message Mount {
521
510
// …
522
511
523
-
// Absolute local path where the OCI object got mounted.
524
-
string mountpoint = 2;
512
+
// Mount an image reference (image ID, with or without digest), which is a
513
+
// special use case for OCI volume mounts. If this field is set, then
514
+
// host_path should be unset. All OCI mounts are per feature definition
515
+
// readonly. The kubelet does an PullImage RPC and evaluates the returned
516
+
// PullImageResponse.image_ref value, which is then set to the
517
+
// ImageSpec.image field. Runtimes are expected to mount the image as
518
+
// required.
519
+
// Introduced in the OCI Volume Source KEP: https://kep.k8s.io/4639
520
+
ImageSpec image = 9;
525
521
}
526
522
```
527
523
528
524
This allows to re-use the existing kubelet logic for managing the OCI objects,
529
525
with the caveat that the new `VolumeSource` won't be isolated in a dedicated
530
526
plugin as part of the existing [volume manager](https://github.com/kubernetes/kubernetes/tree/6d0aab2/pkg/kubelet/volumemanager).
531
527
532
-
The added `mount_label` allow the kubelet to support SELinux contexts.
528
+
Runtimes are already aware of the correct SELinux parameters during container
529
+
creation and will re-use them for the OCI object mounts.
533
530
534
-
The kubelet will use the `mountpoint` on container creation
535
-
(by calling the `CreateContainer` RPC) to indicate the additional required volume mount ([`ContainerConfig.Mount`](https://github.com/kubernetes/cri-api/blob/3a66d9d/pkg/apis/runtime/v1/api.proto#L1102))
536
-
from the runtime. The runtime needs to ensure that mount and also manages its
537
-
lifecycle, for example to remove the bind mount on container removal.
531
+
The kubelet will use the returned `PullImageResponse.image_ref` on pull and sets
532
+
it to `Mount.image.image` together with the other fields for `Mount.image`. The
533
+
runtime will then mount the OCI object directly on container creation assuming
534
+
it's already present on disk. The runtime also manages the lifecycle of the
535
+
mount, for example to remove the OCI bind mount on container removal as well as
536
+
the object mount on the `RemoveImage` RPC.
538
537
539
538
The kubelet tracks the information about which OCI object is used by which
540
-
sandbox and therefore manages the lifecycle of them.
541
-
542
-
The proposal also considers smaller CRI changes, for example to add a list of
543
-
mounted volume paths to the `ImageStatusResponse.Image` message returned by the
544
-
`ImageStatus` RPC. This allows providing the right amount of information between
545
-
the kubelet and the runtime to ensure that no context gets lost in restart
546
-
scenarios.
539
+
sandbox and therefore manages the lifecycle of them for garbage collection
540
+
purposes.
547
541
548
542
The overall flow for container creation will look like this:
549
543
@@ -554,32 +548,30 @@ sequenceDiagram
554
548
Note left of K: During pod sync
555
549
Note over K,C: CRI
556
550
K->>+C: RPC: PullImage
557
-
Note right of C: Pull and mount<br/>OCI object
558
-
C-->>-K: PullImageResponse.Mountpoint
551
+
Note right of C: Pull OCI object
552
+
C-->>-K: PullImageResponse.image_ref
559
553
Note left of K: Add mount points<br/> to container<br/>creation request
560
554
K->>+C: RPC: CreateContainer
561
-
Note right of C: Add bind mounts<br/>from object mount<br/>point to container
555
+
Note right of C: Mount OCI object
556
+
Note right of C: Add OCI bind mounts<br/>from OCI object<br/>to container
562
557
C-->>-K: CreateContainerResponse
563
558
```
564
559
565
560
1.**Kubelet Initiates Image Pull**:
566
561
- During pod setup, the kubelet initiates the pull for the OCI object based on the volume source.
567
-
- The kubelet passes the necessary indicator to mount the object to the container runtime.
568
562
569
563
2.**Runtime Handles Mounting**:
570
-
- The container runtime mounts the OCI object as a filesystem using the metadata provided by the kubelet.
571
-
- The runtime returns the mount point information to the kubelet.
564
+
- The runtime returns the image reference information to the kubelet.
572
565
573
566
3.**Redirecting of the Mountpoint**:
574
-
- The kubelet uses the returned mount point to build the container creation request for each container using that mount.
575
-
- The kubelet initiates the container creation and the runtime creates the required bind mounts to the target location.
567
+
- The kubelet uses the returned image reference to build the container creation request for each container using that mount.
568
+
- The kubelet initiates the container creation and the runtime creates the required OCI object mount as well as bind mounts to the target location.
576
569
This is the current implemented behavior for all other mounts and should require no actual container runtime code change.
577
570
578
571
4.**Lifecycle Management**:
579
572
- The container runtime manages the lifecycle of the mounts, ensuring they are created during pod setup and cleaned up upon sandbox removal.
580
573
581
574
5.**Tracking and Coordination**:
582
-
- The kubelet and runtime coordinate to track pods requesting mounts to avoid removing containers with volumes in use.
583
575
- During image garbage collection, the runtime provides the kubelet with the necessary mount information to ensure proper cleanup.
584
576
585
577
6.**SELinux Context Handling**:
@@ -597,19 +589,17 @@ sequenceDiagram
597
589
598
590
#### Container Runtimes
599
591
600
-
Container runtimes need to support the new `mount` field, otherwise the
601
-
feature cannot be used. The kubelet will verify if the returned `mountpoint`
602
-
actually exists on disk to check the feature availability, because Protobuf will
603
-
strip the field in a backwards compatible way for older runtimes. Pods using the
604
-
new `VolumeSource` combined with a not supported container runtime version will
605
-
fail to run on the node.
592
+
Container runtimes need to support the new `Mount.image` field, otherwise the
593
+
feature cannot be used. Pods using the new `VolumeSource` combined with a not
594
+
supported container runtime version will fail to run on the node, because the
595
+
`Mount.host_path` field is not set for those mounts.
606
596
607
597
For security reasons, volume mounts should set the [`noexec`] and `ro`
608
598
(read-only) options by default.
609
599
610
600
##### Filesystem representation
611
601
612
-
Container Runtimes are expected to return a `mountpoint`, which is a single
602
+
Container Runtimes are expected to manage a `mountpoint`, which is a single
613
603
directory containing the unpacked (in case of tarballs) and merged layer files
614
604
from the image or artifact. If an OCI artifact has multiple layers (in the same
615
605
way as for container images), then the runtime is expected to merge them
The container runtime can now pull the artifact with the `mount = true` CRI
720
-
field set, for example using an experimental [`crictl pull --mount` flag](https://github.com/kubernetes-sigs/cri-tools/compare/master...saschagrunert:oci-volumesource-poc):
721
-
722
-
```bash
723
-
sudo crictl pull --mount localhost:5000/image:v1
724
-
```
725
-
726
-
```console
727
-
Image is up to date for localhost:5000/image@sha256:7728cb2fa5dc31ad8a1d05d4e4259d37c3fc72e1fbdc0e1555901687e34324e9
728
-
Image mounted to: /var/lib/containers/storage/overlay/7ee9a1dcea9f152b10590871e55e485b249cd42ea912111ff9f99ab663c1001a/merged
729
-
```
730
-
731
-
And the returned `mountpoint` contains the unpacked layers as directory tree:
732
-
733
-
```bash
734
-
sudo tree /var/lib/containers/storage/overlay/7ee9a1dcea9f152b10590871e55e485b249cd42ea912111ff9f99ab663c1001a/merged
0 commit comments