rootfs: remove --no-mount-fallback and finally fix MS_REMOUNT

The original reasoning for this option was to avoid having mount options
be overwritten by runc. However, adding command-line arguments has
historically been a bad idea because it forces strict-runc-compatible
OCI runtimes to copy out-of-spec features directly from runc and these
flags are usually quite difficult to enable by users when using runc
through several layers of engines and orchestrators.

A far more preferable solution is to have a heuristic which detects
whether copying the original mount's mount options would override an
explicit mount option specified by the user. In this case, we should
return an error. You only end up in this path in the userns case, if you
have a bind-mount source with locked flags.

During the course of writing this patch, I discovered that several
aspects of our handling of flags for bind-mounts left much to be
desired. We have completely botched the handling of explicitly cleared
flags since commit 97f5ee4e6a ("Only remount if requested flags differ
from current"), with our behaviour only becoming increasingly more weird
with 50105de1d8 ("Fix failure with rw bind mount of a ro fuse") and
da780e4d27 ("Fix bind mounts of filesystems with certain options
set"). In short, we would only clear flags explicitly request by the
user purely by chance, in ways that it really should've been reported to
us by now. The most egregious is that mounts explicitly marked "rw" were
actually mounted "ro" if the bind-mount source was "ro" and no other
special flags were included. In addition, our handling of atime was
completely broken -- mostly due to how subtle the semantics of atime are
on Linux.

Unfortunately, while the runtime-spec requires us to implement
mount(8)'s behaviour, several aspects of the util-linux mount(8)'s
behaviour are broken and thus copying them makes little sense. Since the
runtime-spec behaviour for this case (should mount options for a "bind"
mount use the "mount --bind -o ..." or "mount --bind -o remount,..."
semantics? Is the fallback code we have for userns actually
spec-compliant?) and the mount(8) behaviour (see [1]) are not
well-defined, this commit simply fixes the most obvious aspects of the
behaviour that are broken while keeping the current spirit of the
implementation.

NOTE: The handling of atime in the base case is left for a future PR to
deal with. This means that the atime of the source mount will be
silently left alone unless the fallback path needs to be taken, and any
flags not explicitly set will be cleared in the base case. Whether we
should always be operating as "mount --bind -o remount,..." (where we
default to the original mount source flags) is a topic for a separate PR
and (probably) associated runtime-spec PR.

So, to resolve this:

* We store which flags were explicitly requested to be cleared by the
  user, so that we can detect whether the userns fallback path would end
  up setting a flag the user explicitly wished to clear. If so, we
  return an error because we couldn't fulfil the configuration settings.

* Revert 97f5ee4e6a ("Only remount if requested flags differ from
  current"), as missing flags do not mean we can skip MS_REMOUNT (in
  fact, missing flags are how you indicate a flag needs to be cleared
  with mount(2)). The original purpose of the patch was to fix the
  userns issue, but as mentioned above the correct mechanism is to do a
  fallback mount that copies the lockable flags from statfs(2).

* Improve handling of atime in the fallback case by:
    - Correctly handling the returned flags in statfs(2).
    - Implement the MNT_LOCK_ATIME checks in our code to ensure we
      produce errors rather than silently producing incorrect atime
      mounts.

* Improve the tests so we correctly detect all of these contingencies,
  including a general "bind-mount atime handling" test to ensure that
  the behaviour described here is accurate.

This change also inlines the remount() function -- it was only ever used
for the bind-mount remount case, and its behaviour is very bind-mount
specific.

[1]: https://github.com/util-linux/util-linux/issues/2433

Reverts: 97f5ee4e6a ("Only remount if requested flags differ from current")
Fixes: 50105de1d8 ("Fix failure with rw bind mount of a ro fuse")
Fixes: da780e4d27 ("Fix bind mounts of filesystems with certain options set")
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
This commit is contained in:
Aleksa Sarai
2023-08-06 13:21:05 +10:00
parent 153865d0fe
commit 7c71a22705
10 changed files with 542 additions and 129 deletions

View File

@@ -461,7 +461,6 @@ _runc_run() {
--no-subreaper
--no-pivot
--no-new-keyring
--no-mount-fallback
"
local options_with_args="
@@ -568,7 +567,6 @@ _runc_create() {
--help
--no-pivot
--no-new-keyring
--no-mount-fallback
"
local options_with_args="
@@ -629,7 +627,6 @@ _runc_restore() {
--no-pivot
--auto-dedup
--lazy-pages
--no-mount-fallback
"
local options_with_args="

View File

@@ -51,10 +51,6 @@ command(s) that get executed on start, edit the args parameter of the spec. See
Name: "preserve-fds",
Usage: "Pass N additional file descriptors to the container (stdio + $LISTEN_FDS + N in total)",
},
cli.BoolFlag{
Name: "no-mount-fallback",
Usage: "Do not fallback when the specific configuration is not applicable (e.g., do not try to remount a bind mount again after the first attempt failed on source filesystems that have nodev, noexec, nosuid, noatime, relatime, strictatime, nodiratime set)",
},
},
Action: func(context *cli.Context) error {
if err := checkArgs(context, 1, exactArgs); err != nil {

View File

@@ -214,10 +214,6 @@ type Config struct {
// When RootlessCgroups is set, cgroups errors are ignored.
RootlessCgroups bool `json:"rootless_cgroups,omitempty"`
// Do not try to remount a bind mount again after the first attempt failed on source
// filesystems that have nodev, noexec, nosuid, noatime, relatime, strictatime, nodiratime set
NoMountFallback bool `json:"no_mount_fallback,omitempty"`
// TimeOffsets specifies the offset for supporting time namespaces.
TimeOffsets map[string]specs.LinuxTimeOffset `json:"time_offsets,omitempty"`

View File

@@ -15,6 +15,10 @@ type Mount struct {
// Mount flags.
Flags int `json:"flags"`
// Mount flags that were explicitly cleared in the configuration (meaning
// the user explicitly requested that these flags *not* be set).
ClearedFlags int `json:"cleared_flags"`
// Propagation Flags
PropagationFlags []int `json:"propagation_flags"`

View File

@@ -34,7 +34,6 @@ type mountConfig struct {
cgroup2Path string
rootlessCgroups bool
cgroupns bool
noMountFallback bool
}
// mountEntry contains mount data specific to a mount point.
@@ -83,7 +82,6 @@ func prepareRootfs(pipe *syncSocket, iConfig *initConfig, mountFds mountFds) (er
cgroup2Path: iConfig.Cgroup2Path,
rootlessCgroups: iConfig.RootlessCgroups,
cgroupns: config.Namespaces.Contains(configs.NEWCGROUP),
noMountFallback: config.NoMountFallback,
}
for i, m := range config.Mounts {
entry := mountEntry{Mount: m}
@@ -409,6 +407,51 @@ func doTmpfsCopyUp(m mountEntry, rootfs, mountLabel string) (Err error) {
})
}
const (
// The atime "enum" flags (which are mutually exclusive).
mntAtimeEnumFlags = unix.MS_NOATIME | unix.MS_RELATIME | unix.MS_STRICTATIME
// All atime-related flags.
mntAtimeFlags = mntAtimeEnumFlags | unix.MS_NODIRATIME
// Flags which can be locked when inheriting mounts in a different userns.
// In the kernel, these are the mounts that are locked using MNT_LOCK_*.
mntLockFlags = unix.MS_RDONLY | unix.MS_NODEV | unix.MS_NOEXEC |
unix.MS_NOSUID | mntAtimeFlags
)
func statfsToMountFlags(st unix.Statfs_t) int {
// From <linux/statfs.h>.
const ST_NOSYMFOLLOW = 0x2000 //nolint:revive
var flags int
for _, f := range []struct {
st, ms int
}{
// See calculate_f_flags() in fs/statfs.c.
{unix.ST_RDONLY, unix.MS_RDONLY},
{unix.ST_NOSUID, unix.MS_NOSUID},
{unix.ST_NODEV, unix.MS_NODEV},
{unix.ST_NOEXEC, unix.MS_NOEXEC},
{unix.ST_MANDLOCK, unix.MS_MANDLOCK},
{unix.ST_SYNCHRONOUS, unix.MS_SYNCHRONOUS},
{unix.ST_NOATIME, unix.MS_NOATIME},
{unix.ST_NODIRATIME, unix.MS_NODIRATIME},
{unix.ST_RELATIME, unix.MS_RELATIME},
{ST_NOSYMFOLLOW, unix.MS_NOSYMFOLLOW},
// There is no ST_STRICTATIME -- see below.
} {
if int(st.Flags)&f.st == f.st {
flags |= f.ms
}
}
// MS_STRICTATIME is a "fake" MS_* flag. It isn't stored in mnt->mnt_flags,
// and so it doesn't show up in statfs(2). If none of the other flags in
// atime enum are present, the mount is MS_STRICTATIME.
if flags&mntAtimeEnumFlags == 0 {
flags |= unix.MS_STRICTATIME
}
return flags
}
func mountToRootfs(c *mountConfig, m mountEntry) error {
rootfs := c.root
@@ -509,11 +552,97 @@ func mountToRootfs(c *mountConfig, m mountEntry) error {
return err
}
}
// bind mount won't change mount options, we need remount to make mount options effective.
// first check that we have non-default options required before attempting a remount
if m.Flags&^(unix.MS_REC|unix.MS_REMOUNT|unix.MS_BIND) != 0 {
// only remount if unique mount options are set
if err := remount(m, rootfs, c.noMountFallback); err != nil {
// The initial MS_BIND won't change the mount options, we need to do a
// separate MS_BIND|MS_REMOUNT to apply the mount options. We skip
// doing this if the user has not specified any mount flags at all
// (including cleared flags) -- in which case we just keep the original
// mount flags.
//
// Note that the fact we check whether any clearing flags are set is in
// contrast to mount(8)'s current behaviour, but is what users probably
// expect. See <https://github.com/util-linux/util-linux/issues/2433>.
if m.Flags & ^(unix.MS_BIND|unix.MS_REC|unix.MS_REMOUNT) != 0 || m.ClearedFlags != 0 {
if err := utils.WithProcfd(rootfs, m.Destination, func(dstFD string) error {
flags := m.Flags | unix.MS_BIND | unix.MS_REMOUNT
// The runtime-spec says we SHOULD map to the relevant mount(8)
// behaviour. However, it's not clear whether we want the
// "mount --bind -o ..." or "mount --bind -o remount,..."
// behaviour here -- both of which are somewhat broken[1].
//
// So, if the user has passed "remount" as a mount option, we
// implement the "mount --bind -o remount" behaviour, otherwise
// we implement the spiritual intent of the "mount --bind -o"
// behaviour, which should match what users expect. Maybe
// mount(8) will eventually implement this behaviour too..
//
// [1]: https://github.com/util-linux/util-linux/issues/2433
// Initially, we emulate "mount --bind -o ..." where we set
// only the requested flags (clearing any existing flags). The
// only difference from mount(8) is that we do this
// unconditionally, regardless of whether any set-me mount
// options have been requested.
//
// TODO: We are not doing any special handling of the atime
// flags here, which means that the mount will inherit the old
// atime flags if the user didn't explicitly request a
// different set of flags. This also has the mount(8) bug where
// "nodiratime,norelatime" will result in a
// "nodiratime,relatime" mount.
mountErr := mountViaFDs("", nil, m.Destination, dstFD, "", uintptr(flags), "")
if mountErr == nil {
return nil
}
// If the mount failed, the mount may contain locked mount
// flags. In that case, we emulate "mount --bind -o
// remount,...", where we take the existing mount flags of the
// mount and apply the request flags (including clearing flags)
// on top. The main divergence we have from mount(8) here is
// that we handle atimes correctly to make sure we error out if
// we cannot fulfil the requested mount flags.
var st unix.Statfs_t
if err := unix.Statfs(m.src(), &st); err != nil {
return &os.PathError{Op: "statfs", Path: m.src(), Err: err}
}
srcFlags := statfsToMountFlags(st)
// If the user explicitly request one of the locked flags *not*
// be set, we need to return an error to avoid producing mounts
// that don't match the user's request.
if srcFlags&m.ClearedFlags&mntLockFlags != 0 {
return mountErr
}
// If an MS_*ATIME flag was requested, it must match the
// existing one. This handles two separate kernel bugs, and
// matches the logic of can_change_locked_flags() but without
// these bugs:
//
// * (2.6.30+) Since commit 613cbe3d4870 ("Don't set relatime
// when noatime is specified"), MS_RELATIME is ignored when
// MS_NOATIME is set. This means that us inheriting MS_NOATIME
// from a mount while requesting MS_RELATIME would *silently*
// produce an MS_NOATIME mount.
//
// * (2.6.30+) Since its introduction in commit d0adde574b84
// ("Add a strictatime mount option"), MS_STRICTATIME has
// caused any passed MS_RELATIME and MS_NOATIME flags to be
// ignored which results in us *silently* producing
// MS_STRICTATIME mounts even if the user requested MS_RELATIME
// or MS_NOATIME.
if m.Flags&mntAtimeFlags != 0 && m.Flags&mntAtimeFlags != srcFlags&mntAtimeFlags {
return mountErr
}
// Retry the mount with the existing lockable mount flags
// applied.
flags |= srcFlags & mntLockFlags
mountErr = mountViaFDs("", nil, m.Destination, dstFD, "", uintptr(flags), "")
logrus.Debugf("remount retry: srcFlags=0x%x flagsSet=0x%x flagsClr=0x%x: %v", srcFlags, m.Flags, m.ClearedFlags, mountErr)
return mountErr
}); err != nil {
return err
}
}
@@ -1103,37 +1232,6 @@ func writeSystemProperty(key, value string) error {
return os.WriteFile(path.Join("/proc/sys", keyPath), []byte(value), 0o644)
}
func remount(m mountEntry, rootfs string, noMountFallback bool) error {
return utils.WithProcfd(rootfs, m.Destination, func(dstFD string) error {
flags := uintptr(m.Flags | unix.MS_REMOUNT)
err := mountViaFDs("", nil, m.Destination, dstFD, m.Device, flags, "")
if err == nil {
return nil
}
// Check if the source has flags set according to noMountFallback
src := m.src()
var s unix.Statfs_t
if err := unix.Statfs(src, &s); err != nil {
return &os.PathError{Op: "statfs", Path: src, Err: err}
}
var checkflags int
if noMountFallback {
// Check for ro only
checkflags = unix.MS_RDONLY
} else {
// Check for ro, nodev, noexec, nosuid, noatime, relatime, strictatime,
// nodiratime
checkflags = unix.MS_RDONLY | unix.MS_NODEV | unix.MS_NOEXEC | unix.MS_NOSUID | unix.MS_NOATIME | unix.MS_RELATIME | unix.MS_STRICTATIME | unix.MS_NODIRATIME
}
if int(s.Flags)&checkflags == 0 {
return err
}
// ... and retry the mount with flags found above.
flags |= uintptr(int(s.Flags) & checkflags)
return mountViaFDs("", nil, m.Destination, dstFD, m.Device, flags, "")
})
}
// Do the mount operation followed by additional mounts required to take care
// of propagation flags. This will always be scoped inside the container rootfs.
func mountPropagate(m mountEntry, rootfs string, mountLabel string) error {

View File

@@ -313,7 +313,6 @@ type CreateOpts struct {
Spec *specs.Spec
RootlessEUID bool
RootlessCgroups bool
NoMountFallback bool
}
// getwd is a wrapper similar to os.Getwd, except it always gets
@@ -359,7 +358,6 @@ func CreateLibcontainerConfig(opts *CreateOpts) (*configs.Config, error) {
NoNewKeyring: opts.NoNewKeyring,
RootlessEUID: opts.RootlessEUID,
RootlessCgroups: opts.RootlessCgroups,
NoMountFallback: opts.NoMountFallback,
}
for _, m := range spec.Mounts {
@@ -977,10 +975,15 @@ func parseMountOptions(options []string) *configs.Mount {
// or the flag is not supported on the platform,
// then it is a data value for a specific fs type.
if f, exists := mountFlags[o]; exists && f.flag != 0 {
// FIXME: The *atime flags are special (they are more of an enum
// with quite hairy semantics) and thus arguably setting some of
// them should clear unrelated flags.
if f.clear {
m.Flags &= ^f.flag
m.ClearedFlags |= f.flag
} else {
m.Flags |= f.flag
m.ClearedFlags &= ^f.flag
}
} else if f, exists := mountPropagationMapping[o]; exists && f != 0 {
m.PropagationFlags = append(m.PropagationFlags, f)

View File

@@ -98,10 +98,6 @@ using the runc checkpoint command.`,
Value: "",
Usage: "Specify an LSM mount context to be used during restore.",
},
cli.BoolFlag{
Name: "no-mount-fallback",
Usage: "Do not fallback when the specific configuration is not applicable (e.g., do not try to remount a bind mount again after the first attempt failed on source filesystems that have nodev, noexec, nosuid, noatime, relatime, strictatime, nodiratime set)",
},
},
Action: func(context *cli.Context) error {
if err := checkArgs(context, 1, exactArgs); err != nil {

4
run.go
View File

@@ -64,10 +64,6 @@ command(s) that get executed on start, edit the args parameter of the spec. See
Name: "preserve-fds",
Usage: "Pass N additional file descriptors to the container (stdio + $LISTEN_FDS + N in total)",
},
cli.BoolFlag{
Name: "no-mount-fallback",
Usage: "Do not fallback when the specific configuration is not applicable (e.g., do not try to remount a bind mount again after the first attempt failed on source filesystems that have nodev, noexec, nosuid, noatime, relatime, strictatime, nodiratime set)",
},
},
Action: func(context *cli.Context) error {
if err := checkArgs(context, 1, exactArgs); err != nil {

View File

@@ -4,7 +4,6 @@ load helpers
function setup() {
setup_busybox
update_config '.process.args = ["/bin/echo", "Hello World"]'
}
function teardown() {
@@ -18,96 +17,425 @@ function teardown() {
teardown_bundle
}
function sshfs_has_flag() {
if [ -v DIR ]; then
awk '$2 == "'"$DIR"'" { print $4 }' </proc/self/mounts | grep -E "\b$1\b"
return "$?"
fi
}
function setup_sshfs() {
# Create a fuse-sshfs mount; skip the test if it's not working.
# Create a fuse-sshfs mount (or, failing that, a tmpfs mount).
local sshfs="sshfs
-o UserKnownHostsFile=/dev/null
-o StrictHostKeyChecking=no
-o PasswordAuthentication=no"
DIR="$BATS_RUN_TMPDIR/fuse-sshfs"
mkdir -p "$DIR"
if ! $sshfs -o "$1" rootless@localhost: "$DIR"; then
skip "test requires working sshfs mounts"
if ! [ -v DIR ]; then
DIR="$BATS_RUN_TMPDIR/fuse-sshfs"
mkdir -p "$DIR"
# Make sure we clear all superblock flags to make sure bind-mounts can
# unset these flags.
if ! $sshfs -o rw,suid,dev,exec,atime rootless@localhost: "$DIR"; then
# fallback to tmpfs if running in without sshfs
mount -t tmpfs -o rw,suid,dev,exec,diratime,strictatime tmpfs "$DIR"
fi
fi
# Reset atime flags. "diratime" is quite a strange flag, so we need to make
# sure it's cleared before we apply the requested flags.
mount --bind -o remount,diratime,strictatime "$DIR"
# We need to set the mount flags separately on the mount because some mount
# flags (such as "ro") are set on the superblock if you do them in the
# initial mount, which means that they cannot be cleared by bind-mounts.
#
# This also lets us reconfigure the per-mount settings on each call.
mount --bind -o "remount,$1" "$DIR"
echo "configured $DIR with mount --bind -o remount,$1" >&2
awk '$2 == "'"$DIR"'"' </proc/self/mounts >&2
}
@test "runc run [rw bind mount of a ro fuse sshfs mount]" {
setup_sshfs "ro"
update_config ' .mounts += [{
type: "bind",
source: "'"$DIR"'",
destination: "/mnt",
options: ["rw", "rprivate", "nosuid", "nodev", "rbind"]
}]'
function setup_sshfs_bind_flags() {
host_flags="$1" # ro,nodev,nosuid
bind_flags="$2" # ro,nosuid,bind
runc run --no-mount-fallback test_busybox
setup_sshfs "$host_flags"
cat >"rootfs/find-tmp.awk" <<-'EOF'
#!/bin/awk -f
$2 == "/mnt" { print $4 }
EOF
chmod +x "rootfs/find-tmp.awk"
update_config '.process.args = ["sh", "-c", "/find-tmp.awk </proc/self/mounts"]'
update_config '.mounts = (.mounts | map(select(.destination != "/mnt"))) + [{
"source": "'"$DIR"'",
"destination": "/mnt",
"type": "bind",
"options": '"$(jq -cRM 'split(",")' <<<"$bind_flags")"'
}]'
}
function pass_sshfs_bind_flags() {
setup_sshfs_bind_flags "$@"
runc run test_busybox
[ "$status" -eq 0 ]
mnt_flags="$output"
}
function fail_sshfs_bind_flags() {
setup_sshfs_bind_flags "$@"
runc run test_busybox
[ "$status" -ne 0 ]
[[ "$output" == *"runc run failed: unable to start container process: error during container init: error mounting"*"operation not permitted"* ]]
}
@test "runc run [mount(8)-like behaviour: --bind with no options]" {
requires root
pass_sshfs_bind_flags "ro,noexec,nosymfollow,nodiratime" "bind"
# If no flags were specified alongside bind, we keep all existing flags.
# Unspecified flags must be cleared (rw default).
run -0 grep -wq ro <<<"$mnt_flags"
run ! grep -wq rw <<<"$mnt_flags"
run -0 grep -wq noexec <<<"$mnt_flags"
run -0 grep -wq nodiratime <<<"$mnt_flags"
# On old systems, mount doesn't know about nosymfollow, which turns the
# flag into a data argument (which is ignored by MS_REMOUNT).
if sshfs_has_flag nosymfollow; then run -0 grep -wq nosymfollow <<<"$mnt_flags"; fi
# Now try with a user namespace. The results should be the same as above.
update_config ' .linux.namespaces += [{"type": "user"}]
| .linux.uidMappings += [{"hostID": 100000, "containerID": 0, "size": 65534}]
| .linux.gidMappings += [{"hostID": 100000, "containerID": 0, "size": 65534}] '
pass_sshfs_bind_flags "ro,noexec,nosymfollow,nodiratime" "bind"
# If no flags were specified alongside bind, we keep all existing flags.
# Unspecified flags must be cleared (rw default).
run -0 grep -wq ro <<<"$mnt_flags"
run ! grep -wq rw <<<"$mnt_flags"
run -0 grep -wq noexec <<<"$mnt_flags"
run -0 grep -wq nodiratime <<<"$mnt_flags"
# On old systems, mount doesn't know about nosymfollow, which turns the
# flag into a data argument (which is ignored by MS_REMOUNT).
if sshfs_has_flag nosymfollow; then run -0 grep -wq nosymfollow <<<"$mnt_flags"; fi
}
# This behaviour does not match mount(8), but is preferable to the alternative.
# See <https://github.com/util-linux/util-linux/issues/2433>.
@test "runc run [mount(8)-unlike behaviour: --bind with clearing flag]" {
requires root
pass_sshfs_bind_flags "ro,noexec,nosymfollow,nodiratime" "bind,dev"
# Unspecified flags must be cleared as well.
run ! grep -wq ro <<<"$mnt_flags"
run -0 grep -wq rw <<<"$mnt_flags"
run ! grep -wq noexec <<<"$mnt_flags"
run ! grep -wq nosymfollow <<<"$mnt_flags"
# FIXME FIXME: As with mount(8), trying to clear an atime flag the "naive"
# way will be ignored!
run -0 grep -wq nodiratime <<<"$mnt_flags"
# Now try with a user namespace.
update_config ' .linux.namespaces += [{"type": "user"}]
| .linux.uidMappings += [{"hostID": 100000, "containerID": 0, "size": 65534}]
| .linux.gidMappings += [{"hostID": 100000, "containerID": 0, "size": 65534}] '
pass_sshfs_bind_flags "ro,noexec,nosymfollow,nodiratime" "bind,dev"
# Lockable flags must be kept, because we didn't request them explicitly.
run -0 grep -wq ro <<<"$mnt_flags"
run ! grep -wq rw <<<"$mnt_flags"
run -0 grep -wq noexec <<<"$mnt_flags"
run -0 grep -wq nodiratime <<<"$mnt_flags"
# nosymfollow is not lockable, so it must be cleared.
run ! grep -wq nosymfollow <<<"$mnt_flags"
}
@test "runc run [implied-rw bind mount of a ro fuse sshfs mount]" {
requires root
pass_sshfs_bind_flags "ro" "bind,nosuid,nodev,rprivate"
# Unspecified flags must be cleared (rw default).
run ! grep -wq ro <<<"$mnt_flags"
run -0 grep -wq rw <<<"$mnt_flags"
# The new flags must be applied.
run -0 grep -wq nosuid <<<"$mnt_flags"
run -0 grep -wq nodev <<<"$mnt_flags"
# Now try with a user namespace. The results should be the same as above.
update_config ' .linux.namespaces += [{"type": "user"}]
| .linux.uidMappings += [{"hostID": 100000, "containerID": 0, "size": 65534}]
| .linux.gidMappings += [{"hostID": 100000, "containerID": 0, "size": 65534}] '
pass_sshfs_bind_flags "ro" "bind,nosuid,nodev,rprivate"
# "ro" must still be set (inherited).
run -0 grep -wq ro <<<"$mnt_flags"
# The new flags must be applied.
run -0 grep -wq nosuid <<<"$mnt_flags"
run -0 grep -wq nodev <<<"$mnt_flags"
}
@test "runc run [explicit-rw bind mount of a ro fuse sshfs mount]" {
requires root
# Try to overwrite MS_RDONLY. As we are running in a userns-less container,
# we can overwrite MNT_LOCKED flags.
pass_sshfs_bind_flags "ro" "bind,rw,nosuid,nodev,rprivate"
# "ro" must be cleared and replaced with "rw".
run ! grep -wq ro <<<"$mnt_flags"
run -0 grep -wq rw <<<"$mnt_flags"
# The new flags must be applied.
run -0 grep -wq nosuid <<<"$mnt_flags"
run -0 grep -wq nodev <<<"$mnt_flags"
# Now try with a user namespace.
update_config ' .linux.namespaces += [{"type": "user"}]
| .linux.uidMappings += [{"hostID": 100000, "containerID": 0, "size": 65534}]
| .linux.gidMappings += [{"hostID": 100000, "containerID": 0, "size": 65534}] '
# This must fail because we explicitly requested a mount with a MNT_LOCKED
# mount option cleared (when the source mount has those mounts enabled),
# namely MS_RDONLY.
fail_sshfs_bind_flags "ro" "bind,rw,nosuid,nodev,rprivate"
}
@test "runc run [dev,exec,suid,atime bind mount of a nodev,nosuid,noexec,noatime fuse sshfs mount]" {
setup_sshfs "nodev,nosuid,noexec,noatime"
# The "sync" option is used to trigger a remount with the below options.
# It serves no further purpose. Otherwise only a bind mount without
# applying the below options will be done.
update_config ' .mounts += [{
type: "bind",
source: "'"$DIR"'",
destination: "/mnt",
options: ["dev", "suid", "exec", "atime", "rprivate", "rbind", "sync"]
}]'
requires root
runc run test_busybox
[ "$status" -eq 0 ]
# When running without userns, overwriting host flags should work.
pass_sshfs_bind_flags "nosuid,nodev,noexec,noatime" "bind,dev,suid,exec,atime"
# Unspecified flags must be cleared (rw default).
run ! grep -wq ro <<<"$mnt_flags"
run -0 grep -wq rw <<<"$mnt_flags"
# Check that the flags were actually cleared by the mount.
run ! grep -wq nosuid <<<"$mnt_flags"
run ! grep -wq nodev <<<"$mnt_flags"
run ! grep -wq noexec <<<"$mnt_flags"
# FIXME FIXME: As with mount(8), trying to clear an atime flag the "naive"
# way will be ignored!
run -0 grep -wq noatime <<<"$mnt_flags"
# Now try with a user namespace.
update_config ' .linux.namespaces += [{"type": "user"}]
| .linux.uidMappings += [{"hostID": 100000, "containerID": 0, "size": 65534}]
| .linux.gidMappings += [{"hostID": 100000, "containerID": 0, "size": 65534}] '
# This must fail because we explicitly requested a mount with MNT_LOCKED
# mount options cleared (when the source mount has those mounts enabled).
fail_sshfs_bind_flags "nodev,nosuid,nosuid,noatime" "bind,dev,suid,exec,atime"
}
@test "runc run [ro bind mount of a nodev,nosuid,noexec,noatime fuse sshfs mount]" {
setup_sshfs "nodev,nosuid,noexec,noatime"
update_config ' .mounts += [{
type: "bind",
source: "'"$DIR"'",
destination: "/mnt",
options: ["rbind", "ro"]
}]'
# Test to ensure we don't regress bind-mounting /etc/resolv.conf with
# containerd <https://github.com/containerd/containerd/pull/8309>.
@test "runc run [ro bind mount of a nodev,nosuid,noexec fuse sshfs mount]" {
requires root
runc run test_busybox
[ "$status" -eq 0 ]
# Setting flags that are not locked should work.
pass_sshfs_bind_flags "rw,nodev,nosuid,nodev,noexec,noatime" "bind,ro"
# The flagset should be the union of the two.
run -0 grep -wq ro <<<"$mnt_flags"
# Unspecified flags must be cleared.
run ! grep -wq nosuid <<<"$mnt_flags"
run ! grep -wq nodev <<<"$mnt_flags"
run ! grep -wq noexec <<<"$mnt_flags"
# Now try with a user namespace.
update_config ' .linux.namespaces += [{"type": "user"}]
| .linux.uidMappings += [{"hostID": 100000, "containerID": 0, "size": 65534}]
| .linux.gidMappings += [{"hostID": 100000, "containerID": 0, "size": 65534}] '
# Setting flags that are not locked should work.
pass_sshfs_bind_flags "rw,nodev,nosuid,nodev,noexec,noatime" "bind,ro"
# The flagset should be the union of the two.
run -0 grep -wq ro <<<"$mnt_flags"
# (Unspecified MNT_LOCKED flags are inherited.)
run -0 grep -wq nosuid <<<"$mnt_flags"
run -0 grep -wq nodev <<<"$mnt_flags"
run -0 grep -wq noexec <<<"$mnt_flags"
}
@test "runc run [dev,exec,suid,atime bind mount of a nodev,nosuid,noexec,noatime fuse sshfs mount without fallback]" {
setup_sshfs "nodev,nosuid,noexec,noatime"
# The "sync" option is used to trigger a remount with the below options.
# It serves no further purpose. Otherwise only a bind mount without
# applying the below options will be done.
update_config ' .mounts += [{
type: "bind",
source: "'"$DIR"'",
destination: "/mnt",
options: ["dev", "suid", "exec", "atime", "rprivate", "rbind", "sync"]
}]'
@test "runc run [ro,symfollow bind mount of a rw,nodev,nosymfollow fuse sshfs mount]" {
requires root
runc run --no-mount-fallback test_busybox
# The above will fail as we added --no-mount-fallback which causes us not to
# try to remount a bind mount again after the first attempt failed on source
# filesystems that have nodev, noexec, nosuid, noatime set.
[ "$status" -ne 0 ]
[[ "$output" == *"runc run failed: unable to start container process: error during container init: error mounting"*"operation not permitted"* ]]
pass_sshfs_bind_flags "rw,nodev,nosymfollow" "bind,ro,symfollow"
# Must switch to ro.
run -0 grep -wq ro <<<"$mnt_flags"
run ! grep -wq rw <<<"$mnt_flags"
# Unspecified flags must be cleared.
run ! grep -wq nodev <<<"$mnt_flags"
# nosymfollow must also be cleared.
run ! grep -wq nosymfollow <<<"$mnt_flags"
# Now try with a user namespace.
update_config ' .linux.namespaces += [{"type": "user"}]
| .linux.uidMappings += [{"hostID": 100000, "containerID": 0, "size": 65534}]
| .linux.gidMappings += [{"hostID": 100000, "containerID": 0, "size": 65534}] '
# Unsetting flags that are not lockable should work.
pass_sshfs_bind_flags "rw,nodev,nosymfollow" "bind,ro,symfollow"
# The flagset should be the union of the two.
run -0 grep -wq ro <<<"$mnt_flags"
run -0 grep -wq nodev <<<"$mnt_flags"
# nosymfollow is not lockable, so it must be cleared.
run ! grep -wq nosymfollow <<<"$mnt_flags"
# Implied unsetting of non-lockable flags should also work.
pass_sshfs_bind_flags "rw,nodev,nosymfollow" "bind,rw"
# The flagset should be the union of the two.
run -0 grep -wq rw <<<"$mnt_flags"
run -0 grep -wq nodev <<<"$mnt_flags"
# nosymfollow is not lockable, so it must be cleared.
run ! grep -wq nosymfollow <<<"$mnt_flags"
}
@test "runc run [ro bind mount of a nodev,nosuid,noexec,noatime fuse sshfs mount without fallback]" {
setup_sshfs "nodev,nosuid,noexec,noatime"
update_config ' .mounts += [{
type: "bind",
source: "'"$DIR"'",
destination: "/mnt",
options: ["rbind", "ro"]
}]'
@test "runc run [ro,noexec bind mount of a nosuid,noatime fuse sshfs mount]" {
requires root
runc run --no-mount-fallback test_busybox
# The above will fail as we added --no-mount-fallback which causes us not to
# try to remount a bind mount again after the first attempt failed on source
# filesystems that have nodev, noexec, nosuid, noatime set.
[ "$status" -ne 0 ]
[[ "$output" == *"runc run failed: unable to start container process: error during container init: error mounting"*"operation not permitted"* ]]
# Setting flags that are not locked should work.
pass_sshfs_bind_flags "nodev,nosuid,noatime" "bind,ro,exec"
# The flagset must match the requested set.
run -0 grep -wq ro <<<"$mnt_flags"
run ! grep -wq noexec <<<"$mnt_flags"
# Unspecified flags must be cleared.
run ! grep -wq nosuid <<<"$mnt_flags"
run ! grep -wq nodev <<<"$mnt_flags"
# FIXME: As with mount(8), runc keeps the old atime setting by default.
run -0 grep -wq noatime <<<"$mnt_flags"
# Now try with a user namespace.
update_config ' .linux.namespaces += [{"type": "user"}]
| .linux.uidMappings += [{"hostID": 100000, "containerID": 0, "size": 65534}]
| .linux.gidMappings += [{"hostID": 100000, "containerID": 0, "size": 65534}] '
# Setting flags that are not locked should work.
pass_sshfs_bind_flags "nodev,nosuid,noatime" "bind,ro,exec"
# The flagset should be the union of the two.
run -0 grep -wq ro <<<"$mnt_flags"
run ! grep -wq noexec <<<"$mnt_flags"
# (Unspecified MNT_LOCKED flags are inherited.)
run -0 grep -wq nosuid <<<"$mnt_flags"
run -0 grep -wq nodev <<<"$mnt_flags"
run -0 grep -wq noatime <<<"$mnt_flags"
}
@test "runc run [bind mount {no,rel,strict}atime semantics]" {
requires root
function is_strictatime() {
# There is no "strictatime" in /proc/self/mounts.
run ! grep -wq noatime <<<"${1:-$mnt_flags}"
run ! grep -wq relatime <<<"${1:-$mnt_flags}"
run ! grep -wq nodiratime <<<"${1:-$mnt_flags}"
}
# FIXME: As with mount(8), runc keeps the old atime setting by default.
pass_sshfs_bind_flags "noatime" "bind"
run -0 grep -wq noatime <<<"$mnt_flags"
run ! grep -wq relatime <<<"$mnt_flags"
# FIXME: As with mount(8), runc keeps the old atime setting by default.
pass_sshfs_bind_flags "noatime" "bind,norelatime"
run -0 grep -wq noatime <<<"$mnt_flags"
run ! grep -wq relatime <<<"$mnt_flags"
# FIXME FIXME: As with mount(8), trying to clear an atime flag the "naive"
# way will be ignored!
pass_sshfs_bind_flags "noatime" "bind,atime"
run -0 grep -wq noatime <<<"$mnt_flags"
run ! grep -wq relatime <<<"$mnt_flags"
# ... but explicitly setting a different flag works.
pass_sshfs_bind_flags "noatime" "bind,relatime"
run ! grep -wq noatime <<<"$mnt_flags"
run -0 grep -wq relatime <<<"$mnt_flags"
# Setting a flag that mount(8) would combine should result in only the
# requested flag being set.
pass_sshfs_bind_flags "noatime" "bind,nodiratime"
run ! grep -wq noatime <<<"$mnt_flags"
run -0 grep -wq nodiratime <<<"$mnt_flags"
# MS_DIRATIME implies MS_RELATIME by default.
run -0 grep -wq relatime <<<"$mnt_flags"
# Clearing flags that mount(8) would not clear works.
pass_sshfs_bind_flags "nodiratime" "bind,strictatime"
is_strictatime "$mnt_flags"
# nodiratime is a little weird -- it implies relatime unless you set
# another option (noatime or strictatime). But, runc also has norelatime --
# so nodiratime,norelatime should _probably_ result in the same thing as
# nodiratime,strictatime.
pass_sshfs_bind_flags "noatime" "bind,nodiratime,strictatime"
run ! grep -wq noatime <<<"$mnt_flags"
run -0 grep -wq nodiratime <<<"$mnt_flags"
run ! grep -wq relatime <<<"$mnt_flags"
# FIXME FIXME: relatime should not be set in this case.
pass_sshfs_bind_flags "noatime" "bind,nodiratime,norelatime"
run ! grep -wq noatime <<<"$mnt_flags"
run -0 grep -wq nodiratime <<<"$mnt_flags"
run -0 grep -wq relatime <<<"$mnt_flags"
# Now try with a user namespace.
update_config ' .linux.namespaces += [{"type": "user"}]
| .linux.uidMappings += [{"hostID": 100000, "containerID": 0, "size": 65534}]
| .linux.gidMappings += [{"hostID": 100000, "containerID": 0, "size": 65534}] '
# Requesting a mount without specifying any preference for atime works, and
# inherits the original flags.
pass_sshfs_bind_flags "strictatime" "bind"
is_strictatime "$mnt_flags"
pass_sshfs_bind_flags "relatime" "bind"
run -0 grep -wq relatime <<<"$mnt_flags"
pass_sshfs_bind_flags "nodiratime" "bind"
run -0 grep -wq nodiratime <<<"$mnt_flags"
# MS_DIRATIME implies MS_RELATIME by default.
run -0 grep -wq relatime <<<"$mnt_flags"
pass_sshfs_bind_flags "noatime,nodiratime" "bind"
run -0 grep -wq noatime <<<"$mnt_flags"
run -0 grep -wq nodiratime <<<"$mnt_flags"
# An unrelated clear flag has no effect.
pass_sshfs_bind_flags "noatime,nodiratime" "bind,norelatime"
run -0 grep -wq noatime <<<"$mnt_flags"
run -0 grep -wq nodiratime <<<"$mnt_flags"
# Attempting to change most *atime flags will fail with user namespaces
# because *atime flags are all MNT_LOCKED.
fail_sshfs_bind_flags "nodiratime" "bind,strictatime"
fail_sshfs_bind_flags "relatime" "bind,strictatime"
fail_sshfs_bind_flags "noatime" "bind,strictatime"
fail_sshfs_bind_flags "nodiratime" "bind,noatime"
fail_sshfs_bind_flags "relatime" "bind,noatime"
fail_sshfs_bind_flags "relatime" "bind,nodiratime"
# Make sure strictatime sources are correctly handled by runc (the kernel
# ignores some other mount flags when passing MS_STRICTATIME). See
# remount() in rootfs_linux.go for details.
fail_sshfs_bind_flags "strictatime" "bind,relatime"
fail_sshfs_bind_flags "strictatime" "bind,noatime"
fail_sshfs_bind_flags "strictatime" "bind,nodiratime"
# Make sure that runc correctly handles the MS_NOATIME|MS_RELATIME kernel
# bug. See remount() in rootfs_linux.go for more details.
fail_sshfs_bind_flags "noatime" "bind,relatime"
# Attempting to bind-mount a mount with a request to clear the atime
# setting that would normally inherited must not work.
# FIXME FIXME: All of these cases should fail.
pass_sshfs_bind_flags "strictatime" "bind,nostrictatime"
is_strictatime "$mnt_flags"
pass_sshfs_bind_flags "nodiratime" "bind,diratime"
run -0 grep -wq nodiratime <<<"$mnt_flags"
pass_sshfs_bind_flags "nodiratime" "bind,norelatime" # MS_DIRATIME implies MS_RELATIME
run -0 grep -wq nodiratime <<<"$mnt_flags"
pass_sshfs_bind_flags "relatime" "bind,norelatime"
run -0 grep -wq relatime <<<"$mnt_flags"
pass_sshfs_bind_flags "noatime" "bind,atime"
run -0 grep -wq noatime <<<"$mnt_flags"
pass_sshfs_bind_flags "noatime,nodiratime" "bind,atime"
run -0 grep -wq noatime <<<"$mnt_flags"
run -0 grep -wq nodiratime <<<"$mnt_flags"
}

View File

@@ -180,7 +180,6 @@ func createContainer(context *cli.Context, id string, spec *specs.Spec) (*libcon
Spec: spec,
RootlessEUID: os.Geteuid() != 0,
RootlessCgroups: rootlessCg,
NoMountFallback: context.Bool("no-mount-fallback"),
})
if err != nil {
return nil, err