zishuo/runc

mirror of https://github.com/opencontainers/runc.git synced 2025-10-24 16:10:29 +08:00

Author	SHA1	Message	Date
Kir Kolyshkin	b247cd392a	runc run: fix ro /dev Commit `fb4c27c4b7` (went into v1.0.0-rc93) fixed a bug with read-only tmpfs, but introduced a bug with read-only /dev. This happens because /dev is a tmpfs mount and is therefore remounted read-only a bit earlier than before. To fix, 1. Revert the part of the above commit which remounts all tmpfs mounts as read-only in mountToRootfs. 2. Reuse finalizeRootfs (which is already used to remount /dev read-only) to also remount all ro tmpfs mounts that were previously mounted rw in mountPropagate. 3. Remove the break in finalizeRootfs, as now we have more than one mount to care about. 4. Reorder the if statements in finalizeRootfs to perform the fast check (for ro flag) first, and compare the strings second. Since /dev is most probably also a tmpfs mount, do the m.Device check first. Add a test case to validate the fix and prevent future regressions; make sure it fails before the fix: ✗ runc run [ro /dev mount] (in test file tests/integration/mounts.bats, line 45) `[ "$status" -eq 0 ]' failed runc spec (status=0): runc run test_busybox (status=1): time="2021-11-12T12:19:48-08:00" level=error msg="runc run failed: unable to start container process: error during container init: error mounting \"devpts\" to rootfs at \"/dev/pts\": mkdir /tmp/bats-run-VJXQk7/runc.0Fj70w/bundle/rootfs/dev/pts: read-only file system" Fixes: `fb4c27c4b7` Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-11-15 10:37:16 -08:00
Kir Kolyshkin	7563a8f06d	libct: wrap more unix errors When I tried to start a rootless container under a different/wrong user, I got: $ ../runc/runc --systemd-cgroup --root /tmp/runc.$$ run 445 ERRO[0000] runc run failed: operation not permitted This is obviously not good enough. With this commit, the error is: ERRO[0000] runc run failed: fchown fd 9: operation not permitted Alas, there are still some code that returns unwrapped errnos from various unix calls. This is a followup to commit `d8ba4128b2` which wrapped many, but not all, bare unix errors. Do wrap some more, using either os.PathError or os.SyscallError. While at it, - use os.SyscallError instead of os.NewSyscallError; - use errors.Is(err, os.ErrXxx) instead of os.IsXxx(err). Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-11-12 00:33:59 -08:00
Akihiro Suda	4d17654479	Merge pull request #2576 from kinvolk/alban/userns-2484-take2 Open bind mount sources from the host userns	2021-10-28 14:50:33 +09:00
Kir Kolyshkin	5516294172	Remove io/ioutil use See https://golang.org/doc/go1.16#ioutil Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-10-14 13:46:02 -07:00
Alban Crequy	9c444070ec	Open bind mount sources from the host userns The source of the bind mount might not be accessible in a different user namespace because a component of the source path might not be traversed under the users and groups mapped inside the user namespace. This caused errors such as the following: # time="2020-06-22T13:48:26Z" level=error msg="container_linux.go:367: starting container process caused: process_linux.go:459: container init caused: rootfs_linux.go:58: mounting \"/tmp/busyboxtest/source-inaccessible/dir\" to rootfs at \"/tmp/inaccessible\" caused: stat /tmp/busyboxtest/source-inaccessible/dir: permission denied" To solve this problem, this patch performs the following: 1. in nsexec.c, it opens the source path in the host userns (so we have the right permissions to open it) but in the container mntns (so the kernel cross mntns mount check let us mount it later: https://github.com/torvalds/linux/blob/v5.8/fs/namespace.c#L2312). 2. in nsexec.c, it passes the file descriptors of the source to the child process with SCM_RIGHTS. 3. In runc-init in Golang, it finishes the mounts while inside the userns even without access to the some components of the source paths. Passing the fds with SCM_RIGHTS is necessary because once the child process is in the container mntns, it is already in the container userns so it cannot temporarily join the host mntns. This patch uses the existing mechanism with _LIBCONTAINER_* environment variables to pass the file descriptors from runc to runc init. This patch uses the existing mechanism with the Netlink-style bootstrap to pass information about the list of source mounts to nsexec.c. Rootless containers don't use this bind mount sources fdpassing mechanism because we can't setns() to the target mntns in a rootless container (we don't have the privileges when we are in the host userns). This patch takes care of using O_CLOEXEC on mount fds, and close them early. Fixes: #2484. Signed-off-by: Alban Crequy <alban@kinvolk.io> Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io> Co-authored-by: Rodrigo Campos <rodrigo@kinvolk.io>	2021-10-12 15:13:45 +02:00
Kir Kolyshkin	9ff64c3d97	*: rm redundant linux build tag For files that end with _linux.go or _linux_test.go, there is no need to specify linux build tag, as it is assumed from the file name. In addition, rename libcontainer/notify_linux_v2.go -> libcontainer/notify_v2_linux.go for the file name to make sense. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-08-30 20:15:00 -07:00
Aleksa Sarai	09b80811f6	Revert "libct/devices: change devices.Type to be a string" This reverts commit `814f3ae1d9`. This changed the on-disk state which breaks runc when it has to operate on containers started with an older runc version. Working around this is far more complicated than just reverting it. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2021-08-25 14:11:32 +10:00
Kir Kolyshkin	34df203d13	Merge pull request #3159 from thaJeztah/norunes libct/devices: change devices.Type to be a string	2021-08-23 16:58:34 -07:00
Kir Kolyshkin	75761bccf7	Fix codespell warnings, add codespell to ci The two exceptions I had to add to codespellrc are: - CLOS (used by intelrtd); - creat (syscall name used in tests/integration/testdata/seccomp_*.json). Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-08-17 16:12:35 -07:00
Sebastiaan van Stijn	814f3ae1d9	libct/devices: change devices.Type to be a string Possibly there was a specific reason to use a rune for this, but I noticed that there's various parts in the code that has to convert values from a string to this type. Using a string as type for this can simplify some of that code. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2021-08-13 00:55:22 +02:00
Akihiro Suda	5547b5774f	Merge pull request #3033 from kolyshkin/rm-own-errors libcontainer: rm own error system	2021-07-01 13:47:27 +09:00
Kailun Qin	c508a7bc0a	libct/rootfs: consolidate utils imports Signed-off-by: Kailun Qin <kailun.qin@intel.com>	2021-06-30 06:49:38 -04:00
Kir Kolyshkin	e918d02139	libcontainer: rm own error system This removes libcontainer's own error wrapping system, consisting of a few types and functions, aimed at typization, wrapping and unwrapping of errors, as well as saving error stack traces. Since Go 1.13 now provides its own error wrapping mechanism and a few related functions, it makes sense to switch to it. While doing that, improve some error messages so that they start with "error", "unable to", or "can't". A few things that are worth mentioning: 1. We lose stack traces (which were never shown anyway). 2. Users of libcontainer that relied on particular errors (like ContainerNotExists) need to switch to using errors.Is with the new errors defined in error.go. 3. encoding/json is unable to unmarshal the built-in error type, so we have to introduce initError and wrap the errors into it (basically passing the error as a string). This is the same as it was before, just a tad simpler (actually the initError is a type that got removed in commit afa844311; also suddenly ierr variable name makes sense now). Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-06-24 10:21:04 -07:00
Kir Kolyshkin	7be93a66b9	*: fmt.Errorf: use %w when appropriate This should result in no change when the error is printed, but make the errors returned unwrappable, meaning errors.As and errors.Is will work. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-06-22 16:09:47 -07:00
Kir Kolyshkin	d8ba4128b2	libct/rootfs: improve some errors Errors from os.Open, os.Symlink etc do not need to be wrapped, as they are already wrapped into os.PathError. Error from unix are bare errnos and need to be wrapped. Same os.PathError is a good candidate. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-06-22 16:09:47 -07:00
Kir Kolyshkin	36aefad45d	libct: wrap unix.Mount/Unmount errors Errors returned by unix are bare. In some cases it's impossible to find out what went wrong because there's is not enough context. Add a mountError type (mostly copy-pasted from github.com/moby/sys/mount), and mount/unmount helpers. Use these where appropriate, and convert error checks to use errors.Is. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-06-22 16:09:37 -07:00
Kir Kolyshkin	e6048715e4	Use gofumpt to format code gofumpt (mvdan.cc/gofumpt) is a fork of gofmt with stricter rules. Brought to you by git ls-files \*.go \| grep -v ^vendor/ \| xargs gofumpt -s -w Looking at the diff, all these changes make sense. Also, replace gofmt with gofumpt in golangci.yml. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-06-01 12:17:27 -07:00
Sebastiaan van Stijn	b45fbd43b8	errcheck: libcontainer Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2021-05-20 14:19:26 +02:00
Aleksa Sarai	0ca91f44f1	rootfs: add mount destination validation Because the target of a mount is inside a container (which may be a volume that is shared with another container), there exists a race condition where the target of the mount may change to a path containing a symlink after we have sanitised the path -- resulting in us inadvertently mounting the path outside of the container. This is not immediately useful because we are in a mount namespace with MS_SLAVE mount propagation applied to "/", so we cannot mount on top of host paths in the host namespace. However, if any subsequent mountpoints in the configuration use a subdirectory of that host path as a source, those subsequent mounts will use an attacker-controlled source path (resolved within the host rootfs) -- allowing the bind-mounting of "/" into the container. While arguably configuration issues like this are not entirely within runc's threat model, within the context of Kubernetes (and possibly other container managers that provide semi-arbitrary container creation privileges to untrusted users) this is a legitimate issue. Since we cannot block mounting from the host into the container, we need to block the first stage of this attack (mounting onto a path outside the container). The long-term plan to solve this would be to migrate to libpathrs, but as a stop-gap we implement libpathrs-like path verification through readlink(/proc/self/fd/$n) and then do mount operations through the procfd once it's been verified to be inside the container. The target could move after we've checked it, but if it is inside the container then we can assume that it is safe for the same reason that libpathrs operations would be safe. A slight wrinkle is the "copyup" functionality we provide for tmpfs, which is the only case where we want to do a mount on the host filesystem. To facilitate this, I split out the copy-up functionality entirely so that the logic isn't interspersed with the regular tmpfs logic. In addition, all dependencies on m.Destination being overwritten have been removed since that pattern was just begging to be a source of more mount-target bugs (we do still have to modify m.Destination for tmpfs-copyup but we only do it temporarily). Fixes: CVE-2021-30465 Reported-by: Etienne Champetier <champetier.etienne@gmail.com> Co-authored-by: Noah Meyerhans <nmeyerha@amazon.com> Reviewed-by: Samuel Karp <skarp@amazon.com> Reviewed-by: Kir Kolyshkin <kolyshkin@gmail.com> (@kolyshkin) Reviewed-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2021-05-19 16:58:35 +10:00
Kir Kolyshkin	ff692f289b	Fix cgroup2 mount for rootless case In case of rootless, cgroup2 mount is not possible (see [1] for more details), so since commit `9c81440fb5` runc bind-mounts the whole /sys/fs/cgroup into container. Problem is, if cgroupns is enabled, /sys/fs/cgroup inside the container is supposed to show the cgroup files for this cgroup, not the root one. The fix is to pass through and use the cgroup path in case cgroup2 mount failed, cgroupns is enabled, and the path is non-empty. Surely this requires the /sys/fs/cgroup mount in the spec, so modify runc spec --rootless to keep it. Before: $ ./runc run aaa # find /sys/fs/cgroup/ -type d /sys/fs/cgroup /sys/fs/cgroup/user.slice /sys/fs/cgroup/user.slice/user-1000.slice /sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service ... # ls -l /sys/fs/cgroup/cgroup.controllers -r--r--r-- 1 nobody nogroup 0 Feb 24 02:22 /sys/fs/cgroup/cgroup.controllers # wc -w /sys/fs/cgroup/cgroup.procs 142 /sys/fs/cgroup/cgroup.procs # cat /sys/fs/cgroup/memory.current cat: can't open '/sys/fs/cgroup/memory.current': No such file or directory After: # find /sys/fs/cgroup/ -type d /sys/fs/cgroup/ # ls -l /sys/fs/cgroup/cgroup.controllers -r--r--r-- 1 root root 0 Feb 24 02:43 /sys/fs/cgroup/cgroup.controllers # wc -w /sys/fs/cgroup/cgroup.procs 2 /sys/fs/cgroup/cgroup.procs # cat /sys/fs/cgroup/memory.current 577536 [1] https://github.com/opencontainers/runc/issues/2158 Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-04-20 12:35:40 -07:00
Kir Kolyshkin	3826db196d	libct/rootfs/mountCgroupV2: minor refactor 1. s/cgroupPath/dest/ 2. don't hardcode /sys/fs/cgroup Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-04-20 12:30:30 -07:00
Kir Kolyshkin	1e476578b6	libct/rootfs: introduce and use mountConfig The code is already passing three parameters around from mountToRootfs to mountCgroupV* to mountToRootfs again. I am about to add another parameter, so let's introduce and use struct mountConfig to pass around. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-04-20 12:30:06 -07:00
Kir Kolyshkin	a2050ea471	runc run: fix start for rootless + host pidns Currently, runc fails like this when used from rootless podman with host PID namespace: > $ podman --runtime=runc run --pid=host --rm -it busybox sh > WARN[0000] additional gid=10 is not present in the user namespace, skip setting it > Error: container_linux.go:380: starting container process caused: > process_linux.go:545: container init caused: readonly path /proc/asound: > operation not permitted: OCI permission denied (Here /proc/asound is the first path from OCI spec's readonlyPaths). The code uses MS_BIND\|MS_REMOUNT flags that have a special meaning in the kernel ("keep the flags like nodev, nosuid, noexec as is"). For some reason, this "special meaning" trick is not working for the above use case (rootless podman + no PID namespace), and I don't know how to reproduce this without podman. Instead of relying on the kernel feature, let's just get the current mount flags using fstatfs(2) and add those that needs to be preserved. While at it, wrap errors from unix.Mount into os.PathError to make errors a bit less cryptic. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-04-14 17:32:08 -07:00
Sebastiaan van Stijn	4316df8b53	libcontainer/system: move userns utilities to separate package Moving these utilities to a separate package, so that consumers of this package don't have to pull in the whole "system" package. Looking at uses of these utilities (outside of runc itself); `RunningInUserNS()` is used by [various external consumers][1], so adding a "Deprecated" alias for this. [1]: https://grep.app/search?current=2&q=.RunningInUserNS Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2021-04-04 22:42:03 +02:00
Akihiro Suda	e7bd1fb10a	Merge pull request #2717 from kolyshkin/check-proc-opt libct/checkProcMounts: optimize	2021-01-29 17:32:45 +09:00
Kir Kolyshkin	692fab0936	libct/checkProcMounts: optimize Commit `9c1242ecb` ("Add white list for bind mount chec", Jan 6 2016) added a set of entries under /proc which we allow to be mounted to, for the benefit of lxcfs-like fuse-backed hack to have container's own version of /proc/meminfo etc. For some reason, the allow list check is performed at the very beginning of the function, which is not optimal. Move the check to the end -- at this point in the code we already know we're under /proc, so it make sense to consult the allow list. This makes the code slightly more logical and hopefully slightly faster. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-01-06 15:02:38 -08:00
Kir Kolyshkin	637f82d6b8	runc run: resolve tmpfs mount dest in container scope In case a tmpfs mount path contains absolute symlinks, runc errors out because those symlinks are resolved in the host (rather than container) filesystem scope. The fix is similar to that for bind mounts -- resolve the destination in container rootfs scope using securejoin, and use the resolved path. A simple integration test case is added to prevent future regressions. Fixes https://github.com/opencontainers/runc/issues/2683. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-01-06 14:57:02 -08:00
Feng Sun	48b8eb0952	checkProcMount: add /proc/slabinfo to whitelist With lxcfs commit, slabinfo should can be mounted: "proc_fuse: add /proc/slabinfo with slab accounting memcg" https://github.com/lxc/lxcfs/commit/1cc68c8bfa Signed-off-by: Feng Sun <loyou85@gmail.com>	2020-12-16 09:40:04 +08:00
Sebastiaan van Stijn	677baf22d2	libcontainer: isolate libcontainer/devices Move the Device-related types to libcontainer/devices, so that the package can be used in isolation. Aliases have been created in libcontainer/configs for backward compatibility. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2020-12-01 11:11:21 +01:00
Giuseppe Scrivano	41aa764010	linux: drop MS_REC for readonly remount it has no effect. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2020-11-06 11:36:50 +01:00
Giuseppe Scrivano	a4e6955e31	linux: fix remount readonly in a user namespace if we are remounting root read only when in a user namespace, make sure the existing flags (e.g. MS_NOEXEC, MS_NODEV) are maintained otherwise the mount fails with EPERM. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2020-11-06 11:35:40 +01:00
Akihiro Suda	9d4c02cf29	Merge pull request #2570 from EduardoVega/2246-fix-chmod-ro-tmpfs-mount Fix mount error when chmod RO tmpfs	2020-10-26 18:19:16 +09:00
Aleksa Sarai	b8bf572812	rootfs: handle nested procfs mounts for MS_MOVE In a case where the host /proc mount has already been overmounted, the MS_MOVE handling would get ENOENT when trying to hide (for instance) "/proc/bus" because it had already hidden away "/proc". This revealed two issues in the previous implementation of this hardening feaure: 1. No checks were done to make sure the mount was a "full" mount (it is a mount of the root of the filesystem), but the kernel doesn't permit a non-full mount to be converted to a full mount (for reference, see mnt_already_visible). This just removes extra busy-work during setup. 2. ENOENT was treated as a critical error, even though it actually indicates the mount doesn't exist and thus isn't a problem. A more theoretically pure solution would be to store the set of mountpoints to be hidden and only ignore the error if an ancestor directory of the current mountpoint was already hidden, but that would just add complexity with little justification. In addition, better document the reasoning behind this logic so that folks aren't confused when looking at it. Fixes: `28a697cce3` ("rootfs: umount all procfs and sysfs with --no-pivot") Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2020-10-13 20:54:56 +11:00
Eduardo Vega	fb4c27c4b7	Fix mount error when chmod RO tmpfs Signed-off-by: Eduardo Vega <edvegavalerio@gmail.com>	2020-10-05 21:23:30 -06:00
Kir Kolyshkin	87412ee435	vendor: bump mountinfo v0.3.1 It contains some breaking changes, so fix the code. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-10-01 18:51:25 -07:00
Akihiro Suda	e5f2eae5a5	Merge pull request #2558 from rhatdan/windows Since no kernels support direct labeling of /dev/mqueue remove label	2020-08-22 04:43:36 +09:00
Daniel J Walsh	0445fd60a4	Since no kernels support direct labeling of /dev/mqueue remove label This looks like this is just filling logs for years, since the kernel never added the support for automatically labeling /dev/mqueue. Removes these dmesg lines [ 1731.969847] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue) [ 1736.985146] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue) [ 1738.356796] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue) [ 1738.479952] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue) [ 1738.628935] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue) [ 1763.433276] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue) [ 1806.802133] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue) [ 1806.982003] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue) [ 1808.955390] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue) [ 1815.951076] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue) [ 1827.257757] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue) [ 1828.947888] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue) [ 1834.964451] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue) [ 1835.941465] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue) Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>	2020-08-20 13:56:19 -04:00
Giuseppe Scrivano	a63f99fcc5	Add support for umask Signed-off-by: Ashley Cui <acui@redhat.com>	2020-08-20 11:39:43 -04:00
Aleksa Sarai	2265daa55b	merge branch 'pr-2522' into master Cesar Talledo (2): Remove runc default devices that overlap with spec devices. Skip redundant setup for /dev/ptmx when specified explicitly in the OCI spec. LGTMs: @AkihiroSuda @cyphar Closes #2522	2020-08-19 16:58:23 +10:00
Cesar Talledo	9a699e1a9f	Skip redundant setup for /dev/ptmx when specified explicitly in the OCI spec. Per the OCI spec, /dev/ptmx is always a symlink to /dev/pts/ptmx. As such, if the OCI spec has an explicit entry for /dev/ptmx, runc shall ignore it. This change ensures this is the case. A integration test was also added (in tests/integration/dev.bats). Signed-off-by: Cesar Talledo <ctalledo@nestybox.com>	2020-08-07 16:46:26 -07:00
Sebastiaan van Stijn	901dccf05d	vendor: update runtime-spec v1.0.3-0.20200728170252-4d89ac9fbff6 Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2020-07-30 22:08:54 +02:00
Xiaodong Liu	af283b3f47	remove redundant the parameter of chroot function Signed-off-by: Xiaodong Liu <liuxiaodong@loongson.cn>	2020-07-15 16:22:07 +08:00
Renaud Gaubert	ccdd75760c	Add the CreateRuntime, CreateContainer and StartContainer Hooks Signed-off-by: Renaud Gaubert <rgaubert@nvidia.com>	2020-06-17 02:10:00 +00:00
Aleksa Sarai	24388be71e	configs: use different types for .Devices and .Resources.Devices Making them the same type is simply confusing, but also means that you could accidentally use one in the wrong context. This eliminates that problem. This also includes a whole bunch of cleanups for the types within DeviceRule, so that they can be used more ergonomically. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2020-05-13 17:38:45 +10:00
Aleksa Sarai	b2bec9806f	cgroup: devices: eradicate the Allow/Deny lists These lists have been in the codebase for a very long time, and have been unused for a large portion of that time -- specconv doesn't generate them and the only user of these flags has been tests (which doesn't inspire much confidence). In addition, we had an incorrect implementation of a white-list policy. This wasn't exploitable because all of our users explicitly specify "deny all" as the first rule, but it was a pretty glaring issue that came from the "feature" that users can select whether they prefer a white- or black- list. Fix this by always writing a deny-all rule (which is what our users were doing anyway, to work around this bug). This is one of many changes needed to clean up the devices cgroup code. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2020-05-13 17:38:45 +10:00
Sebastiaan van Stijn	64ca54816c	libcontainer: simplify error message The error message was including both the rootfs path, and the full mount path, which also includes the path of the rootfs. This patch removes the rootfs path from the error message, as it was redundant, and made the error message overly verbose Before this patch (errors wrapped for readability): ``` container_linux.go:348: starting container process caused: process_linux.go:438: container init caused: rootfs_linux.go:58: mounting "/foo.txt" to rootfs "/var/lib/docker/overlay2/de506d67da606b807009e23b548fec60d72359c77eec88785d8c7ecd54a6e4b2/merged" at "/var/lib/docker/overlay2/de506d67da606b807009e23b548fec60d72359c77eec88785d8c7ecd54a6e4b2/merged/usr/share/nginx/html" caused: not a directory: unknown ``` With this patch applied: ``` container_linux.go:348: starting container process caused: process_linux.go:438: container init caused: rootfs_linux.go:58: mounting "/foo.txt" to rootfs at "/var/lib/docker/overlay2/de506d67da606b807009e23b548fec60d72359c77eec88785d8c7ecd54a6e4b2/merged/usr/share/nginx/html" caused: not a directory: unknown ``` Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2020-05-03 02:59:46 +02:00
Kir Kolyshkin	55d5c99ca7	libct/mountToRootfs: rm useless code To make a bind mount read-only, it needs to be remounted. This is what the code removed does, but it is not needed here. We have to deal with three cases here: 1. cgroup v2 unified mode. In this case the mount is real mount with fstype=cgroup2, and there is no need to have a bind mount on top, as we pass readonly flag to the mount as is. 2. cgroup v1 + cgroupns (enableCgroupns == true). In this case the "mount" is in fact a set of real mounts with fstype=cgroup, and they are all performed in mountCgroupV1, with readonly flag added if needed. 3. cgroup v1 as is (enableCgroupns == false). In this case mountCgroupV1() calls mountToRootfs() again with an argument from the list obtained from getCgroupMounts(), i.e. a bind mount with the same flags as the original mount has (plus unix.MS_BIND \| unix.MS_REC), and mountToRootfs() does remounting (under the case "bind":). So, the code which this patch is removing is not needed -- it essentially does nothing in case 3 above (since the bind mount is already remounted readonly), and in cases 1 and 2 it creates an unneeded extra bind mount on top of a real one (or set of real ones). Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-04-23 16:49:12 -07:00
Kir Kolyshkin	9280e3566d	checkpoint/restore: fix cgroupv2 handling In case of cgroupv2 unified hierarchy, the /sys/fs/cgroup mount is the real mount with fstype of cgroup2 (rather than a set of external bind mounts like for cgroupv1). So, we should not add it to the list of "external bind mounts" on both checkpoint and restore. Without this fix, checkpoint integration tests fail on cgroup v2. Also, same is true for cgroup v1 + cgroupns. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-04-22 11:26:43 -07:00
Kir Kolyshkin	dd7b34618f	libct/msMoveRoot: benefit from GetMounts filter Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-03-21 10:33:43 -07:00
Kir Kolyshkin	fc4357a8b0	libct/msMoveRoot: rm redundant filepath.Abs() calls 1. rootfs is already validated to be kosher by (*ConfigValidator).rootfs() 2. mount points from /proc/self/mountinfo are absolute and clean, too Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-03-21 10:33:43 -07:00

1 2 3 4 5

208 Commits