zishuo/runc

mirror of https://github.com/opencontainers/runc.git synced 2025-12-24 11:50:58 +08:00

Author	SHA1	Message	Date
Kir Kolyshkin	6ede591761	internal/systemd: simplify Remove unused code and argument from the ActivationFiles, and simplify its usage. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-12-08 15:34:58 -08:00
Kir Kolyshkin	ba9e60f7a8	Remove crypto/tls dependency It appears that when we import github.com/coreos/go-systemd/activation, it brings in the whole crypto/tls package (which is not used by runc directly or indirectly), making the runc binary size larger and potentially creating issues with FIPS compliance. Let's copy the code of function we use from go-systemd/activation to avoid that. The space savings are: $ size runc.before runc.after text data bss dec hex filename 7101084 5049593 271560 12422237 bd8c5d runc.before 6508796 4623281 229128 11361205 ad5bb5 runc.after Reported-by: Dimitri John Ledkov <dimitri.ledkov@surgut.co.uk> Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-12-08 15:31:42 -08:00
Akihiro Suda	64c3c8eea6	Merge pull request #4994 from kolyshkin/gofumpt-extra Enable gofumpt extra rules	2025-11-28 09:30:57 +09:00
Aleksa Sarai	195e9551e4	pathrs: add MkdirAllParentInRoot helper While CreateInRoot supports hallucinating the target path, we do not use it directly when constructing device inode targets because we need to have different handling for mknod and bind-mounts. The solution is to simply have a more generic MkdirAllParentInRoot helper that MkdirAll's the parent directory of the target path and then allows the caller to create the trailing component however they like. (This can be used by CreateInRoot internally as well!) Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-26 21:04:05 +11:00
Aleksa Sarai	cfb74326be	pathrs: add "hallucination" helpers for SecureJoin magic In order to maintain compatibility with previous releases of runc (which permitted dangling symlinks as path components by permitting non-existent path components to be treated like real directories) we have to first do SecureJoin to construct a target path that is compatible with the old behaviour but has all dangling symlinks (or other invalid paths like ".." components after non-existent directories) removed. This is effectively a more generic verison of commit `3f925525b4` ("rootfs: re-allow dangling symlinks in mount targets") and will let us remove the need for open-coding SecureJoin workarounds. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-26 21:04:05 +11:00
Aleksa Sarai	20c5a8ec4a	pathrs: rename MkdirAllInRootOpen -> MkdirAllInRoot Now that MkdirAllInRoot has been removed, we can make MkdirAllInRootOpen less wordy by renaming it to MkdirAllInRoot. This is a non-functional change. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-26 21:04:04 +11:00
Aleksa Sarai	9dbd37e06f	libct: switch final WithProcfd users to WithProcfdFile This probably should've been done as part of commit `d40b3439a9` ("rootfs: switch to fd-based handling of mountpoint targets") but it seems I missed them when doing the rest of the conversions. This also lets us remove utils.WithProcfd entirely, as well as pathrs.MkdirAllInRoot. Unfortunately, WithProcfd was exposed in the externally-importable "libcontainer/utils" package and so we need to have a deprecation notice to remove it in runc 1.5. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-26 21:03:30 +11:00
Aleksa Sarai	42a1e19d67	libcontainer: move CleanPath and StripRoot to internal/pathrs These helpers will be needed for the compatibility code added in future patches in this series, but because "internal/pathrs" is imported by "libcontainer/utils" we need to move them so that we can avoid circular dependencies. Because the old functions were in a non-internal package it is possible some downstreams use them, so add some wrappers but mark them as deprecated. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-26 21:03:29 +11:00
Kir Kolyshkin	67840cce4b	Enable gofumpt extra rules Commit `b2f8a74d` "clothed" the naked return as inflicted by gofumpt v0.9.0. Since gofumpt v0.9.2 this rule was moved to "extra" category, not enabled by default. The only other "extra" rule is to group adjacent parameters with the same type, which also makes sense. Enable gofumpt "extra" rules, and reformat the code accordingly. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-11-10 13:18:45 -08:00
Aleksa Sarai	a0e809a8ba	libct: switch to unix.SetMemPolicy wrapper This is mostly a mechanical change, but we also need to change some types to match the "mode int" argument that golang.org/x/sys/unix decided to use. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-10 16:03:02 +11:00
Aleksa Sarai	96f1962f91	deps: update to github.com/opencontainers/selinux@v0.13.0 This new version includes the fixes for CVE-2025-52881, so we can remove the internal/third_party copy of the library we added in commit `ed6b1693b8` ("selinux: use safe procfs API for labels") as well as the "replace" directive in go.mod (which is problematic for "go get" installs). Fixes: `ed6b1693b8` ("selinux: use safe procfs API for labels") Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-08 02:14:38 +11:00
Aleksa Sarai	a41366e740	openat2: improve resilience on busy systems Previously, we would see a ~3% failure rate when starting containers with mounts that contain ".." (which can trigger -EAGAIN). To counteract this, filepath-securejoin v0.5.1 includes a bump of the internal retry limit from 32 to 128, which lowers the failure rate to 0.12%. However, there is still a risk of spurious failure on regular systems. In order to try to provide more resilience (while avoiding DoS attacks), this patch also includes an additional retry loop that terminates based on a deadline rather than retry count. The deadline is 2ms, as my testing found that ~800us for a single pathrs operation was the longest latency due to -EAGAIN retries, and that was an outlier compared to the more common ~400us latencies -- so 2ms should be more than enough for any real system. The failure rates above were based on more 50k runs of runc with an attack script (from libpathrs) running a rename attack on all cores of a 16-core system, which is arguably a worst-case but heavily utilised servers could likely approach similar results. Tested-by: Phil Estes <estesp@gmail.com> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-05 18:57:51 +11:00
Aleksa Sarai	ed6b1693b8	selinux: use safe procfs API for labels Due to the sensitive nature of these fixes, it was not possible to submit these upstream and vendor the upstream library. Instead, this patch uses a fork of github.com/opencontainers/selinux, branched at commit opencontainers/selinux@879a755db5. In order to permit downstreams to build with this patched version, a snapshot of the forked version has been included in internal/third_party/selinux. Note that since we use "go mod vendor", the patched code is usable even without being "go get"-able. Once the embargo for this issue is lifted we can submit the patches upstream and switch back to a proper upstream go.mod entry. Also, this requires us to temporarily disable the CI job we have that disallows "replace" directives. Fixes: GHSA-cgrx-mc8f-2prm CVE-2025-52881 Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-01 21:24:06 +11:00
Aleksa Sarai	d40b3439a9	rootfs: switch to fd-based handling of mountpoint targets An attacker could race with us during mount configuration in order to trick us into mounting over an unexpected path. This would bypass checkProcMount() and would allow for security profiles to be left unapplied by mounting over /proc/self/attr/... (or even more serious outcomes such as killing the entire system by tricking runc into writing strings to /proc/sysrq-trigger). This is a larger issue with our current mount infrastructure, and the ideal solution would be to rewrite it all to be fd-based (which would also allow us to support the "new" mount API, which also avoids a bunch of other issues with mount(8)). However, such a rewrite is not really workable as a security fix, so this patch is a bit of a compromise approach to fix the issue while also moving us a bit towards that eventual end-goal. The core issue in CVE-2025-52881 is that we currently use the (insecure) SecureJoin to re-resolve mountpoint target paths multiple times during mounting. Rather than generating a string from createMountpoint(), we instead open an os.File handle to the target mountpoint directly and then operate on that handle. This will make it easier to remove utils.WithProcfd() and rework mountViaFds() in the future. The only real issue we need to work around is that we need to re-open the mount target after doing the mount in order to get a handle to the mountpoint -- pathrs.Reopen() doesn't work in this case (it just re-opens the inode under the mountpoint) so we need to do a naive re-open using the full path. Note that if we used move_mount(2) this wouldn't be a problem because we would have a handle to the mountpoint itself. Note that this is still somewhat of a temporary solution -- ideally mountViaFds would use os.File directly to let us avoid some other issues with using bare /proc/... paths, as well as also letting us more easily use the new mount API on modern kernels. Fixes: GHSA-cgrx-mc8f-2prm CVE-2025-52881 Co-developed-by: lifubang <lifubang@acmcoder.com> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-01 21:24:06 +11:00
Aleksa Sarai	77d217c7c3	init: write sysctls using safe procfs API sysctls could in principle also be used as a write gadget for arbitrary procfs files. As this requires getting a non-subset=pid /proc handle we amortise this by only allocating a single procfs handle for all sysctl writes. Fixes: GHSA-cgrx-mc8f-2prm CVE-2025-52881 Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-01 21:24:05 +11:00
Aleksa Sarai	ff6fe13246	utils: use safe procfs for /proc/self/fd loop code From a safety perspective this might not be strictly required, but it paves the way for us to remove utils.ProcThreadSelf. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-01 21:24:04 +11:00
Aleksa Sarai	01de9d65dc	rootfs: avoid using os.Create for new device inodes If an attacker were to make the target of a device inode creation be a symlink to some host path, os.Create would happily truncate the target which could lead to all sorts of issues. This exploit is probably not as exploitable because device inodes are usually only bind-mounted for rootless containers, which cannot overwrite important host files (though user files would still be up for grabs). The regular inode creation logic could also theoretically be tricked into changing the access mode and ownership of host files if the newly-created device inode was swapped with a symlink to a host path. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-01 21:24:04 +11:00
Aleksa Sarai	77889b56db	internal: add wrappers for securejoin.Proc* Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-01 21:24:04 +11:00
Aleksa Sarai	44a0fcf685	go.mod: update to github.com/cyphar/filepath-securejoin@v0.5.0 In order to avoid lint errors due to the deprecation of the top-level securejoin methods ported from libpathrs, we need to adjust internal/pathrs to use the new pathrs-lite subpackage instead. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-01 21:24:03 +11:00
Aleksa Sarai	531ef794e4	console: use TIOCGPTPEER when allocating peer PTY When opening the peer end of a pty, the old kernel API required us to open /dev/pts/$num inside the container (at least since we fixed console handling many years ago in commit `244c9fc426` (": console rewrite")). The problem is that in a hostile container it is possible for /dev/pts/$num to be an attacker-controlled symlink that runc can be tricked into resolving when doing bind-mounts. This allows the attacker to (among other things) persist /proc/... entries that are later masked by runc, allowing an attacker to escape through the kernel.core_pattern sysctl (/proc/sys/kernel/core_pattern). This is the original issue reported by Lei Wang and Li Fu Bang in CVE-2025-52565. However, it should be noted that this is not entirely a newly-discovered problem. Way back in Linux 4.13 (2017), I added the TIOCGPTPEER ioctl, which allows us to get a pty peer without touching the /dev/pts inside the container. The original threat model was around an attacker replacing /dev/pts/$n or /dev/pts/ptmx with some malicious inode (a DoS inode, or possibly a PTY they wanted a confused deputy to operate on). Unfortunately, there was no practical way for runc to cache a safe O_PATH handle to /dev/pts/ptmx (unlike other runtimes like LXC, which switched to TIOCGPTPEER way back in 2017). Since it wasn't clear how we could protect against the main attack TIOCGPTPEER was meant to protect against, we never switched to it (even though I implemented it specifically to harden container runtimes). Unfortunately, It turns out that mount sources* are a threat we didn't fully consider. Since TIOCGPTPEER already solves this problem entirely for us in a race free way, we should just use that. In a later patch, we will add some hardening for /dev/pts/$num opening to maintain support for very old kernels (Linux 4.13 is very old at this point, but RHEL 7 is still kicking and is stuck on Linux 3.10). Fixes: GHSA-qw9x-cqr3-wc7r CVE-2025-52565 Reported-by: Lei Wang <ssst0n3@gmail.com> (CVE-2025-52565) Reported-by: lfbzhm <lifubang@acmcoder.com> (CVE-2025-52565) Reported-by: Aleksa Sarai <cyphar@cyphar.com> (TIOCGPTPEER) Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-01 21:24:03 +11:00
Aleksa Sarai	ff94f9991b	*: switch to safer securejoin.Reopen filepath-securejoin v0.3 gave us a much safer re-open primitive, we should use it to avoid any theoretical attacks. Rather than using it direcly, add a small pathrs wrapper to make libpathrs migrations in the future easier... Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-01 21:24:02 +11:00
Aleksa Sarai	6fc1914491	internal: move utils.MkdirAllInRoot to internal/pathrs We will have more wrappers around filepath-securejoin, and so move them to their own specific package so that we can eventually use libpathrs fairly cleanly (by swapping out the implementation). Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-01 21:24:02 +11:00
Aleksa Sarai	db19bbed53	internal/sys: add VerifyInode helper This will be used for a few security patches in later patches in this patchset. The need to verify what kind of inode we are operating on in a race-free way turns out to be quite a common pattern... Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-01 21:24:01 +11:00
Aleksa Sarai	a672a5f36c	merge #4726 into opencontainers/runc:main Antti Kervinen (1): Add memory policy support LGTMs: lifubang AkihiroSuda cyphar	2025-10-08 05:18:13 +11:00
Antti Kervinen	eda7bdf80c	Add memory policy support Implement support for Linux memory policy in OCI spec PR: https://github.com/opencontainers/runtime-spec/pull/1282 Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>	2025-10-07 15:06:37 +03:00
Aleksa Sarai	627054d246	lint/revive: add package doc comments This silences all of the "should have a package comment" lint warnings from golangci-lint. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-10-03 15:17:43 +10:00
Kir Kolyshkin	491326cdeb	int/linux: add/use Recvfrom Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-03-26 14:16:53 -07:00
Kir Kolyshkin	e655abc0da	int/linux: add/use Dup3, Open, Openat Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-03-26 14:16:53 -07:00
Kir Kolyshkin	c690b66d7f	int/linux: add/use Exec Drop the libcontainer/system/exec, and use the linux.Exec instead. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-03-26 14:16:53 -07:00
Kir Kolyshkin	431b8bb4d8	int/linux: add/use Getwd Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-03-26 14:16:53 -07:00
Kir Kolyshkin	8cc1eb379b	Introduce and use internal/linux This package is to provide unix.* wrappers to ensure that: - they retry on EINTR; - a "rich" error is returned on failure. A first such wrapper, Sendmsg, is introduced. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-03-26 14:16:50 -07:00
Kir Kolyshkin	9cb59b4659	ci: rm "skip on CentOS 7" kludges We no longer test on CentOS 7. Remove the internal/testutil package as it has no other uses. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2024-11-07 13:16:16 -08:00
Kir Kolyshkin	a2f7c6add8	internal/testutil: create, add SkipOnCentOS CentOS 7 is showing its age and we'd rather skip some tests on it than find out why they are flaky. Add internal/testutil package, and move the generalized version of SkipOnCentOS7 from libcontainer/cgroups/devices to there. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2023-10-30 16:54:17 -07:00

33 Commits