zishuo/runc

mirror of https://github.com/opencontainers/runc.git synced 2025-12-24 11:50:58 +08:00

Author	SHA1	Message	Date
Kir Kolyshkin	652269729d	libc/int: use strings.Builder Generated by modernize@latest (v0.21.0). Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-12-16 15:04:04 -08:00
Akihiro Suda	4dcda051da	Merge pull request #5055 from kolyshkin/mpol-2 libct/configs: mark MPOL_* constants as deprecated	2025-12-16 10:39:09 +09:00
Curd Becker	536e183451	Replace os.Is* error checking functions with their errors.Is counterpart Signed-off-by: Curd Becker <me@curd-becker.de>	2025-12-11 03:16:02 +01:00
Kir Kolyshkin	3741f9186d	libct/configs: mark MPOL_* constants as deprecated Alas, these new constants are already in v1.4.0 release so we can't remove those right away, but we can mark them as deprecated now and target removal for v1.5.0. So, - mark them as deprecated; - redefine via unix.MPOL_* counterparts; - fix the validator code to use unix.MPOL_* directly. This amends commit `a0e809a8`. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-12-08 15:36:29 -08:00
Kir Kolyshkin	8a9b4dcda6	libct: mountFd: close mountFile on error Reported in issue 5008. Reported-by: Arina Cherednik <arinacherednik034@gmail.com> Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-12-02 15:15:23 -08:00
Kir Kolyshkin	c24965b742	libct: newProcessComm: close fds on error Reported in issue 5008. Reported-by: Arina Cherednik <arinacherednik034@gmail.com> Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-12-02 15:15:23 -08:00
Kir Kolyshkin	88f897160c	libct: startInitialization: add defer close This function calls Init what normally never returns, so the defer only works if there is an error and we can safely use it to close those fds we opened. This was done for most but not all fds. Reported in issue 5008. Reported-by: Arina Cherednik <arinacherednik034@gmail.com> Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-12-02 15:15:23 -08:00
Kir Kolyshkin	1f1ff4be06	Merge pull request #5051 from cyphar/libct-utils-deprecated libct/utils: remove Deprecated functions	2025-12-02 15:06:01 -08:00
Akihiro Suda	64c3c8eea6	Merge pull request #4994 from kolyshkin/gofumpt-extra Enable gofumpt extra rules	2025-11-28 09:30:57 +09:00
Aleksa Sarai	a412bd93e9	libct/utils: remove Deprecated functions These were all marked for deprecation in runc 1.5.0, so remove them now to make sure we don't forget. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-28 11:11:11 +11:00
Aleksa Sarai	195e9551e4	pathrs: add MkdirAllParentInRoot helper While CreateInRoot supports hallucinating the target path, we do not use it directly when constructing device inode targets because we need to have different handling for mknod and bind-mounts. The solution is to simply have a more generic MkdirAllParentInRoot helper that MkdirAll's the parent directory of the target path and then allows the caller to create the trailing component however they like. (This can be used by CreateInRoot internally as well!) Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-26 21:04:05 +11:00
Aleksa Sarai	cfb74326be	pathrs: add "hallucination" helpers for SecureJoin magic In order to maintain compatibility with previous releases of runc (which permitted dangling symlinks as path components by permitting non-existent path components to be treated like real directories) we have to first do SecureJoin to construct a target path that is compatible with the old behaviour but has all dangling symlinks (or other invalid paths like ".." components after non-existent directories) removed. This is effectively a more generic verison of commit `3f925525b4` ("rootfs: re-allow dangling symlinks in mount targets") and will let us remove the need for open-coding SecureJoin workarounds. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-26 21:04:05 +11:00
Aleksa Sarai	20c5a8ec4a	pathrs: rename MkdirAllInRootOpen -> MkdirAllInRoot Now that MkdirAllInRoot has been removed, we can make MkdirAllInRootOpen less wordy by renaming it to MkdirAllInRoot. This is a non-functional change. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-26 21:04:04 +11:00
Aleksa Sarai	9dbd37e06f	libct: switch final WithProcfd users to WithProcfdFile This probably should've been done as part of commit `d40b3439a9` ("rootfs: switch to fd-based handling of mountpoint targets") but it seems I missed them when doing the rest of the conversions. This also lets us remove utils.WithProcfd entirely, as well as pathrs.MkdirAllInRoot. Unfortunately, WithProcfd was exposed in the externally-importable "libcontainer/utils" package and so we need to have a deprecation notice to remove it in runc 1.5. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-26 21:03:30 +11:00
Aleksa Sarai	42a1e19d67	libcontainer: move CleanPath and StripRoot to internal/pathrs These helpers will be needed for the compatibility code added in future patches in this series, but because "internal/pathrs" is imported by "libcontainer/utils" we need to move them so that we can avoid circular dependencies. Because the old functions were in a non-internal package it is possible some downstreams use them, so add some wrappers but mark them as deprecated. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-26 21:03:29 +11:00
Akihiro Suda	475473d869	Merge pull request #5026 from lifubang/ci-detect-fdleak-try-best fix fd leaks and detect them as comprehensively as possible	2025-11-26 08:33:53 +09:00
Kir Kolyshkin	f944ccecb2	runc create/run/exec: show fatal errors from init In case early stage of runc init (nsenter) fails for some reason, it logs error(s) with FATAL log level, via bail(). The runc init log is read by a parent (runc create/run/exec) and is logged via normal logrus mechanism, which is all fine and dandy, except when `runc init` fails, we return the error from the parent (which is usually not too helpful, for example): runc run failed: unable to start container process: can't get final child's PID from pipe: EOF Now, the actual underlying error is from runc init and it was logged earlier; here's how full runc output looks like: FATA[0000] nsexec-1[3247792]: failed to unshare remaining namespaces: No space left on device FATA[0000] nsexec-0[3247790]: failed to sync with stage-1: next state ERRO[0000] runc run failed: unable to start container process: can't get final child's PID from pipe: EOF The problem is, upper level runtimes tend to ignore everything except the last line from runc, and thus error reported by e.g. docker is not very helpful. This patch tries to improve the situation by collecting FATAL errors from runc init and appending those to the error returned (instead of logging). With it, the above error will look like this: ERRO[0000] runc run failed: unable to start container process: can't get final child's PID from pipe: EOF; runc init error(s): nsexec-1[141549]: failed to unshare remaining namespaces: No space left on device; nsexec-0[141547]: failed to sync with stage-1: next state Yes, it is long and ugly, but at least the upper level runtime will report it. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-11-22 12:11:20 +07:00
lifubang	6ac151d69b	libct: add a defer fd close in createDeviceNode Signed-off-by: lifubang <lifubang@acmcoder.com>	2025-11-20 19:43:22 +08:00
lifubang	69785c117c	libct: always close m.dstFile in mountToRootfs Signed-off-by: lifubang <lifubang@acmcoder.com>	2025-11-20 19:43:22 +08:00
Kir Kolyshkin	5fbc3bb019	libct/int: TestFdLeaks: deflake Since the recent CVE fixes, TestFdLeaksSystemd sometimes fails: === RUN TestFdLeaksSystemd exec_test.go:1750: extra fd 9 -> /12224/task/13831/fd exec_test.go:1753: found 1 extra fds after container.Run --- FAIL: TestFdLeaksSystemd (0.10s) It might have been caused by the change to the test code in commit `ff6fe13` ("utils: use safe procfs for /proc/self/fd loop code") -- we are now opening a file descriptor during the logic to get a list of file descriptors. If the file descriptor happens to be allocated to a different number, you'll get an error. Let's try to filter out the fd used to read a directory. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-11-20 15:36:14 +08:00
Aleksa Sarai	3b75374cc7	runtime-spec: update pids.limit handling to match new guidance The main update is actually in github.com/opencontainers/cgroups, but we need to also update runtime-spec to a newer pre-release version to get the updates from there as well. In short, the behaviour change is now that "0" is treated as a valid value to set in "pids.max", "-1" means "max" and unset/nil means "do nothing". As described in the opencontainers/cgroups PR, this change is actually backwards compatible because our internal state.json stores PidsLimit, and that entry is marked as "omitempty". So, an old runc would omit PidsLimit=0 in state.json, and this will be parsed by a new runc as being "nil" -- and both would treat this case as "do not set anything". Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-11 15:15:27 +11:00
Kir Kolyshkin	67840cce4b	Enable gofumpt extra rules Commit `b2f8a74d` "clothed" the naked return as inflicted by gofumpt v0.9.0. Since gofumpt v0.9.2 this rule was moved to "extra" category, not enabled by default. The only other "extra" rule is to group adjacent parameters with the same type, which also makes sense. Enable gofumpt "extra" rules, and reformat the code accordingly. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-11-10 13:18:45 -08:00
Aleksa Sarai	eec1f7e34b	merge #4973 into opencontainers/runc:main Aleksa Sarai (1): rootfs: only set mode= for tmpfs mount if target already existed LGTMS: lifubang thaJeztah	2025-11-11 03:10:01 +11:00
Aleksa Sarai	a0e809a8ba	libct: switch to unix.SetMemPolicy wrapper This is mostly a mechanical change, but we also need to change some types to match the "mode int" argument that golang.org/x/sys/unix decided to use. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-10 16:03:02 +11:00
Aleksa Sarai	9a9719eeb4	rootfs: only set mode= for tmpfs mount if target already existed This was always the intended behaviour but commit `72fbb34f50` ("rootfs: switch to fd-based handling of mountpoint targets") regressed it when adding a mechanism to create a file handle to the target if it didn't already exist (causing the later stat to always succeed). A lot of people depend on this functionality, so add some tests to make sure we don't break it in the future. Fixes: `72fbb34f50` ("rootfs: switch to fd-based handling of mountpoint targets") Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-08 23:11:57 +11:00
Kir Kolyshkin	1b954f1f06	libct: fix mips compilation On MIPS arches, Rdev is uint32 so we have to convert it. Fixes issue 4962. Fixes: `8476df83` ("libct: add/use isDevNull, verifyDevNull") Fixes: `de87203e` ("console: verify /dev/pts/ptmx before use") Fixes: `398955bc` ("console: add fallback for pre-TIOCGPTPEER kernels") Reported-by: Tianon Gravi <admwiggin@gmail.com> Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-11-05 17:56:14 -08:00
Aleksa Sarai	3f925525b4	rootfs: re-allow dangling symlinks in mount targets It seems there are a fair few images where dangling symlinks are used as path components for mount targets, which pathrs-lite does not support (and it would be difficult to fully support this in a race-free way). This was actually meant to be blocked by commit `63c2908164` ("rootfs: try to scope MkdirAll to stay inside the rootfs"), followed by commit `dd827f7b71` ("utils: switch to securejoin.MkdirAllHandle"). However, we still used SecureJoin to construct mountpoint targets, which means that dangling symlinks were "resolved" before reaching pathrs-lite. This patch basically re-adds this hack in order to reduce the breakages we've seen so far. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-05 18:58:07 +11:00
Aleksa Sarai	d40b3439a9	rootfs: switch to fd-based handling of mountpoint targets An attacker could race with us during mount configuration in order to trick us into mounting over an unexpected path. This would bypass checkProcMount() and would allow for security profiles to be left unapplied by mounting over /proc/self/attr/... (or even more serious outcomes such as killing the entire system by tricking runc into writing strings to /proc/sysrq-trigger). This is a larger issue with our current mount infrastructure, and the ideal solution would be to rewrite it all to be fd-based (which would also allow us to support the "new" mount API, which also avoids a bunch of other issues with mount(8)). However, such a rewrite is not really workable as a security fix, so this patch is a bit of a compromise approach to fix the issue while also moving us a bit towards that eventual end-goal. The core issue in CVE-2025-52881 is that we currently use the (insecure) SecureJoin to re-resolve mountpoint target paths multiple times during mounting. Rather than generating a string from createMountpoint(), we instead open an os.File handle to the target mountpoint directly and then operate on that handle. This will make it easier to remove utils.WithProcfd() and rework mountViaFds() in the future. The only real issue we need to work around is that we need to re-open the mount target after doing the mount in order to get a handle to the mountpoint -- pathrs.Reopen() doesn't work in this case (it just re-opens the inode under the mountpoint) so we need to do a naive re-open using the full path. Note that if we used move_mount(2) this wouldn't be a problem because we would have a handle to the mountpoint itself. Note that this is still somewhat of a temporary solution -- ideally mountViaFds would use os.File directly to let us avoid some other issues with using bare /proc/... paths, as well as also letting us more easily use the new mount API on modern kernels. Fixes: GHSA-cgrx-mc8f-2prm CVE-2025-52881 Co-developed-by: lifubang <lifubang@acmcoder.com> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-01 21:24:06 +11:00
lifubang	4b37cd93f8	libct: align param type for mountCgroupV1/V2 functions Signed-off-by: lifubang <lifubang@acmcoder.com>	2025-11-01 21:24:05 +11:00
Aleksa Sarai	d61fd29d85	libct/system: use securejoin for /proc/$pid/stat Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-01 21:24:05 +11:00
Aleksa Sarai	435cc81be6	init: use securejoin for /proc/self/setgroups Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-01 21:24:05 +11:00
Aleksa Sarai	77d217c7c3	init: write sysctls using safe procfs API sysctls could in principle also be used as a write gadget for arbitrary procfs files. As this requires getting a non-subset=pid /proc handle we amortise this by only allocating a single procfs handle for all sysctl writes. Fixes: GHSA-cgrx-mc8f-2prm CVE-2025-52881 Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-01 21:24:05 +11:00
Aleksa Sarai	b3dd1bc562	utils: remove unneeded EnsureProcHandle All of the callers of EnsureProcHandle now use filepath-securejoin's ProcThreadSelf to get a file handle, which has much stricter verification to avoid procfs attacks than EnsureProcHandle's very simplistic filesystem type check. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-01 21:24:05 +11:00
Aleksa Sarai	ff6fe13246	utils: use safe procfs for /proc/self/fd loop code From a safety perspective this might not be strictly required, but it paves the way for us to remove utils.ProcThreadSelf. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-01 21:24:04 +11:00
Aleksa Sarai	fdcc9d3cad	apparmor: use safe procfs API for labels EnsureProcHandle only protects us against a tmpfs mount, but the risk of a procfs path being used (such as /proc/self/sched) has been known for a while. Now that filepath-securejoin has a reasonably safe procfs API, switch to it. Fixes: GHSA-cgrx-mc8f-2prm CVE-2025-52881 Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-01 21:24:04 +11:00
Aleksa Sarai	aee7d3fe35	ci: add lint to forbid the usage of os.Create os.Create is shorthand for open(O_CREAT\|O_TRUNC) without O_EXCL, which is incredibly unsafe for us to do when interacting with a container rootfs (especially before pivot_root) as an attacker could swap the target path with a symlink that points to the host filesystem, causing us to delete the contents of or create host files. We did have a similar bug in CVE-2024-45310, but in that case we (luckily) didn't have O_TRUNC set which avoided the worst possible case. However, os.Create does set O_TRUNC and we were using it in scenarios that may have been exploitable. Because of how easy it us for us to accidentally introduce this kind of bug, we should simply not allow the usage of os.Create in our entire codebase. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-01 21:24:04 +11:00
Aleksa Sarai	01de9d65dc	rootfs: avoid using os.Create for new device inodes If an attacker were to make the target of a device inode creation be a symlink to some host path, os.Create would happily truncate the target which could lead to all sorts of issues. This exploit is probably not as exploitable because device inodes are usually only bind-mounted for rootless containers, which cannot overwrite important host files (though user files would still be up for grabs). The regular inode creation logic could also theoretically be tricked into changing the access mode and ownership of host files if the newly-created device inode was swapped with a symlink to a host path. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-01 21:24:04 +11:00
Aleksa Sarai	de87203e62	console: verify /dev/pts/ptmx before use This is primarily done out of an abudance of caution against runc exec being attacked by a container where /dev/pts/ptmx has been replaced with some other bad inode (a disconnected NFS handle, a symlink that goes through a leaked runc file descriptor to reference a host ptmx, etc). Unfortunately, we cannot trivially verify that /dev/pts/ptmx is actually the /dev/pts from the container without storing stuff like the fsid in the runc state.json, which is probably not worth the extra effort. This should at least avoid the most concerning cases. Reported-by: Aleksa Sarai <cyphar@cyphar.com> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-01 21:24:03 +11:00
Aleksa Sarai	9be1dbf4ac	console: avoid trivial symlink attacks for /dev/console An attacker could make /dev/console a symlink. This presents two possible issues: 1. os.Create will happily truncate targets, which could have resulted in a worse version of CVE-2024-4531. Luckily, this all happens after pivot_root(2) so the scope of that particular attack is fairly limited (you are unlikely to be able to easily access host rootfs files -- though it might be possible to take advantage of leaks such as in CVE-2024-21626). However, O_CREAT\|O_NOFOLLOW is what we should be doing for all file creations. 2. Because we passed /dev/console as the only mount path (as opposed to using a /proc/self/fd/$n path), an attacker could swap the symlink to point to any other path and thus cause us to mount over some other path. This is not as big of a problem because all the mounts are in the container namespace after pivot_root(2), and users usually can create arbitrary mount targets inside the container. These issues don't seem particularly exploitable, but they deserve to be hardened regardless. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-01 21:24:03 +11:00
Aleksa Sarai	398955bccb	console: add fallback for pre-TIOCGPTPEER kernels The pty driver has very consistent allocation rules for the major:minor numbers of /dev/pts/$n inodes, so it is possible to somewhat safely open /dev/pts/* paths if we validate that the inode is the one we expect. It is possible for an attacker to have over-mounted a pts peer from a different devpts instance, but to fix this would require more tracking of devpts instances than runc currently can do. This means runc should continue to work on very old kernels. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-01 21:24:03 +11:00
Aleksa Sarai	531ef794e4	console: use TIOCGPTPEER when allocating peer PTY When opening the peer end of a pty, the old kernel API required us to open /dev/pts/$num inside the container (at least since we fixed console handling many years ago in commit `244c9fc426` (": console rewrite")). The problem is that in a hostile container it is possible for /dev/pts/$num to be an attacker-controlled symlink that runc can be tricked into resolving when doing bind-mounts. This allows the attacker to (among other things) persist /proc/... entries that are later masked by runc, allowing an attacker to escape through the kernel.core_pattern sysctl (/proc/sys/kernel/core_pattern). This is the original issue reported by Lei Wang and Li Fu Bang in CVE-2025-52565. However, it should be noted that this is not entirely a newly-discovered problem. Way back in Linux 4.13 (2017), I added the TIOCGPTPEER ioctl, which allows us to get a pty peer without touching the /dev/pts inside the container. The original threat model was around an attacker replacing /dev/pts/$n or /dev/pts/ptmx with some malicious inode (a DoS inode, or possibly a PTY they wanted a confused deputy to operate on). Unfortunately, there was no practical way for runc to cache a safe O_PATH handle to /dev/pts/ptmx (unlike other runtimes like LXC, which switched to TIOCGPTPEER way back in 2017). Since it wasn't clear how we could protect against the main attack TIOCGPTPEER was meant to protect against, we never switched to it (even though I implemented it specifically to harden container runtimes). Unfortunately, It turns out that mount sources* are a threat we didn't fully consider. Since TIOCGPTPEER already solves this problem entirely for us in a race free way, we should just use that. In a later patch, we will add some hardening for /dev/pts/$num opening to maintain support for very old kernels (Linux 4.13 is very old at this point, but RHEL 7 is still kicking and is stuck on Linux 3.10). Fixes: GHSA-qw9x-cqr3-wc7r CVE-2025-52565 Reported-by: Lei Wang <ssst0n3@gmail.com> (CVE-2025-52565) Reported-by: lfbzhm <lifubang@acmcoder.com> (CVE-2025-52565) Reported-by: Aleksa Sarai <cyphar@cyphar.com> (TIOCGPTPEER) Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-01 21:24:03 +11:00
Kir Kolyshkin	5d7b242407	libct: maskPaths: don't rely on ENOTDIR for mount Currently, we rely on mount returning ENOTDIR when the destination is a directory (and so mount tells us that the source is not), and fall back to read-only tmpfs bind mount for such cases. Theoretically, ENOTDIR can also be returned in some other cases, resulting in the wrong type of mount being used. Let's be more straightforward here -- call fstat on destination file descriptor, and use the proper mount depending on whether it is a directory. Reported-by: Rodrigo Campos <rodrigoca@microsoft.com> Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-11-01 21:24:02 +11:00
Kir Kolyshkin	1a30a8f3d9	libct: maskPaths: only ignore ENOENT on mount dest When mounting a path being masked, the /dev/null might disappear from under us, and mount (even on an opened /dev/null file descriptor) will return ENOENT, which we deliberately ignore, as there's no need to mask non-existent paths. Let's open the destination path and ignore ENOENT during open, then mount via the destination file descriptor, not ignoring ENOENT. Reported-by: lifubang <lifubang@acmcoder.com> Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-11-01 21:24:02 +11:00
Kir Kolyshkin	8476df83b5	libct: add/use isDevNull, verifyDevNull The /dev/null in a container should not be trusted, because when /dev is a bind mount, /dev/null is not created by runc itself. 1. Add isDevNull which checks the fd minor/major and device type, and verifyDevNull which does the stat and the check. 2. Rewrite maskPath to open and check /dev/null, and use its fd to perform mounts. Move the loop over the MaskPaths into the function, and rename it to maskPaths. 3. reOpenDevNull: use verifyDevNull and isDevNull. 4. fixStdioPermissions: use isDevNull instead of stat. Fixes: GHSA-9493-h29p-rfm2 CVE-2025-31133 Co-authored-by: Rodrigo Campos <rodrigoca@microsoft.com> Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-01 21:24:02 +11:00
Aleksa Sarai	ff94f9991b	*: switch to safer securejoin.Reopen filepath-securejoin v0.3 gave us a much safer re-open primitive, we should use it to avoid any theoretical attacks. Rather than using it direcly, add a small pathrs wrapper to make libpathrs migrations in the future easier... Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-01 21:24:02 +11:00
Aleksa Sarai	6fc1914491	internal: move utils.MkdirAllInRoot to internal/pathrs We will have more wrappers around filepath-securejoin, and so move them to their own specific package so that we can eventually use libpathrs fairly cleanly (by swapping out the implementation). Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-01 21:24:02 +11:00
Kir Kolyshkin	6c18b25cdc	libct/nsenter: better read/write errors Introduce and use iobail, xread, and xwrite wrappers so that we can properly check read/write return value and call either bail or bailx on error, with proper diagnostics (distinguishing failed read/write from a short read/write). This prevents the "Success" prefix in errors like: failed to sync with stage-1: next state: Success Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-10-28 17:26:09 -07:00
Kir Kolyshkin	aea52d0ab0	libct/nsenter: sprinkle missing sane_kill Add a few missing sane_kill calls where they make sense. Remove one useless sane_kill of stage2_pid, as during SYNC_USERMAP stage2 is not yet started. It is harmless yet it makes the code slightly harder to read. Set the child pid to -1 upon receiving SYNC_CHILD_FINISH to minimize the chances of killing an unrelated process. When a child sends SYNC_CHILD_FINISH it is about to exit (although theoretically it could be stuck during debug logging). Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-10-28 17:21:38 -07:00
Kir Kolyshkin	067b8335e7	libct/nsenter: add and use bailx We use bail to report fatal errors, and bail always append %m (aka strerror(errno)). In case an error condition did not set errno, the log message will end up with ": Success" or an error from a stale errno value. Either case is confusing for users. Introduce bailx which is the same as bail except it does not append %m, and use it where appropriate. The naming follows libc's err(3) and errx(3). PS we still use bail in a few cases after read or write, even if that read/write did not return an error, because the code does not distinguish between short read/write and error (-1). This will be addressed by the next commit. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-10-28 17:21:38 -07:00
Kir Kolyshkin	9c8f476cb6	libct/nsenter: save errno in sane_kill Since sane_kill after a failed read or write, but before reporting the error from that read or write, it may change the errno value in case kill(2) fails. Save and restore the errno around the call to kill. While at it, - change the code to return early; - don't return kill return value as no one is using it, and the errno value no longer correlates. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-10-28 17:21:25 -07:00

1 2 3 4 5 ...

3014 Commits