Skip to content

Commit 65aa700

Browse files
committed
[1.1] runc run: fix mount leak
When preparing to mount container root, we need to make its parent mount private (i.e. disable propagation), otherwise the new in-container mounts are leaked to the host. To find a parent mount, we use to read mountinfo and find the longest entry which can be a parent of the container root directory. Unfortunately, due to kernel bug in all Linux kernels older than v5.8 (see [1], [2]), sometimes mountinfo can't be read in its entirety. In this case, getParentMount may occasionally return a wrong parent mount. As a result, we do not change the mount propagation to private, and container mounts are leaked. Alas, we can not fix the kernel, and reading mountinfo a few times to ensure its consistency (like it's done in, say, Kubernetes) does not look like a good solution for performance reasons. Fortunately, we don't need mountinfo. Let's just traverse the directory tree, trying to remount it private until we find a mount point (any error other than EINVAL means we just found it). Fixes issue 2404. [1]: https://github.com/kolyshkin/procfs-test [2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9f6c61f96f2d97cbb5f Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit 13a6f56) Signed-off-by: Kir Kolyshkin <[email protected]>
1 parent a4cebd3 commit 65aa700

File tree

2 files changed

+28
-46
lines changed

2 files changed

+28
-46
lines changed

CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
66

77
## [Unreleased 1.1.z]
88

9+
### Fixed
10+
11+
* On a system with older kernel, reading `/proc/self/mountinfo` may skip some
12+
entries, as a consequence runc may not properly set mount propagation,
13+
causing container mounts leak onto the host mount namespace. (#2404, #4425)
14+
915
## [1.1.14] - 2024-09-03
1016

1117
> 年を取っていいことは、驚かなくなることね。

libcontainer/rootfs_linux.go

Lines changed: 22 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -801,54 +801,33 @@ func mknodDevice(dest string, node *devices.Device) error {
801801
return os.Chown(dest, int(node.Uid), int(node.Gid))
802802
}
803803

804-
// Get the parent mount point of directory passed in as argument. Also return
805-
// optional fields.
806-
func getParentMount(rootfs string) (string, string, error) {
807-
mi, err := mountinfo.GetMounts(mountinfo.ParentsFilter(rootfs))
808-
if err != nil {
809-
return "", "", err
810-
}
811-
if len(mi) < 1 {
812-
return "", "", fmt.Errorf("could not find parent mount of %s", rootfs)
813-
}
814-
815-
// find the longest mount point
816-
var idx, maxlen int
817-
for i := range mi {
818-
if len(mi[i].Mountpoint) > maxlen {
819-
maxlen = len(mi[i].Mountpoint)
820-
idx = i
804+
// rootfsParentMountPrivate ensures rootfs parent mount is private.
805+
// This is needed for two reasons:
806+
// - pivot_root() will fail if parent mount is shared;
807+
// - when we bind mount rootfs, if its parent is not private, the new mount
808+
// will propagate (leak!) to parent namespace and we don't want that.
809+
func rootfsParentMountPrivate(path string) error {
810+
var err error
811+
// Assuming path is absolute and clean (this is checked in
812+
// libcontainer/validate). Any error other than EINVAL means we failed,
813+
// and EINVAL means this is not a mount point, so traverse up until we
814+
// find one.
815+
for {
816+
err = unix.Mount("", path, "", unix.MS_PRIVATE, "")
817+
if err == nil {
818+
return nil
821819
}
822-
}
823-
return mi[idx].Mountpoint, mi[idx].Optional, nil
824-
}
825-
826-
// Make parent mount private if it was shared
827-
func rootfsParentMountPrivate(rootfs string) error {
828-
sharedMount := false
829-
830-
parentMount, optionalOpts, err := getParentMount(rootfs)
831-
if err != nil {
832-
return err
833-
}
834-
835-
optsSplit := strings.Split(optionalOpts, " ")
836-
for _, opt := range optsSplit {
837-
if strings.HasPrefix(opt, "shared:") {
838-
sharedMount = true
820+
if err != unix.EINVAL || path == "/" { //nolint:errorlint // unix errors are bare
839821
break
840822
}
823+
path = filepath.Dir(path)
841824
}
842-
843-
// Make parent mount PRIVATE if it was shared. It is needed for two
844-
// reasons. First of all pivot_root() will fail if parent mount is
845-
// shared. Secondly when we bind mount rootfs it will propagate to
846-
// parent namespace and we don't want that to happen.
847-
if sharedMount {
848-
return mount("", parentMount, "", "", unix.MS_PRIVATE, "")
825+
return &mountError{
826+
op: "remount-private",
827+
target: path,
828+
flags: unix.MS_PRIVATE,
829+
err: err,
849830
}
850-
851-
return nil
852831
}
853832

854833
func prepareRoot(config *configs.Config) error {
@@ -860,9 +839,6 @@ func prepareRoot(config *configs.Config) error {
860839
return err
861840
}
862841

863-
// Make parent mount private to make sure following bind mount does
864-
// not propagate in other namespaces. Also it will help with kernel
865-
// check pass in pivot_root. (IS_SHARED(new_mnt->mnt_parent))
866842
if err := rootfsParentMountPrivate(config.Rootfs); err != nil {
867843
return err
868844
}

0 commit comments

Comments
 (0)