zishuo/runc

mirror of https://github.com/opencontainers/runc.git synced 2025-10-20 22:19:42 +08:00

Author	SHA1	Message	Date
lfbzhm	95a93c132c	Merge pull request #4045 from fuweid/support-pidfd-socket [feature request] *: introduce pidfd-socket flag	2023-11-22 09:13:55 +08:00
Wei Fu	94505a046a	*: introduce pidfd-socket flag The container manager like containerd-shim can't use cgroup.kill feature or freeze all the processes in cgroup to terminate the exec init process. It's unsafe to call kill(2) since the pid can be recycled. It's good to provide the pidfd of init process through the pidfd-socket. It's similar to the console-socket. With the pidfd, the container manager like containerd-shim can send the signal to target process safely. And for the standard init process, we can have polling support to get exit event instead of blocking on wait4. Signed-off-by: Wei Fu <fuweid89@gmail.com>	2023-11-21 18:28:50 +08:00
Aleksa Sarai	7c71a22705	rootfs: remove --no-mount-fallback and finally fix MS_REMOUNT The original reasoning for this option was to avoid having mount options be overwritten by runc. However, adding command-line arguments has historically been a bad idea because it forces strict-runc-compatible OCI runtimes to copy out-of-spec features directly from runc and these flags are usually quite difficult to enable by users when using runc through several layers of engines and orchestrators. A far more preferable solution is to have a heuristic which detects whether copying the original mount's mount options would override an explicit mount option specified by the user. In this case, we should return an error. You only end up in this path in the userns case, if you have a bind-mount source with locked flags. During the course of writing this patch, I discovered that several aspects of our handling of flags for bind-mounts left much to be desired. We have completely botched the handling of explicitly cleared flags since commit `97f5ee4e6a` ("Only remount if requested flags differ from current"), with our behaviour only becoming increasingly more weird with `50105de1d8` ("Fix failure with rw bind mount of a ro fuse") and `da780e4d27` ("Fix bind mounts of filesystems with certain options set"). In short, we would only clear flags explicitly request by the user purely by chance, in ways that it really should've been reported to us by now. The most egregious is that mounts explicitly marked "rw" were actually mounted "ro" if the bind-mount source was "ro" and no other special flags were included. In addition, our handling of atime was completely broken -- mostly due to how subtle the semantics of atime are on Linux. Unfortunately, while the runtime-spec requires us to implement mount(8)'s behaviour, several aspects of the util-linux mount(8)'s behaviour are broken and thus copying them makes little sense. Since the runtime-spec behaviour for this case (should mount options for a "bind" mount use the "mount --bind -o ..." or "mount --bind -o remount,..." semantics? Is the fallback code we have for userns actually spec-compliant?) and the mount(8) behaviour (see [1]) are not well-defined, this commit simply fixes the most obvious aspects of the behaviour that are broken while keeping the current spirit of the implementation. NOTE: The handling of atime in the base case is left for a future PR to deal with. This means that the atime of the source mount will be silently left alone unless the fallback path needs to be taken, and any flags not explicitly set will be cleared in the base case. Whether we should always be operating as "mount --bind -o remount,..." (where we default to the original mount source flags) is a topic for a separate PR and (probably) associated runtime-spec PR. So, to resolve this: * We store which flags were explicitly requested to be cleared by the user, so that we can detect whether the userns fallback path would end up setting a flag the user explicitly wished to clear. If so, we return an error because we couldn't fulfil the configuration settings. * Revert `97f5ee4e6a` ("Only remount if requested flags differ from current"), as missing flags do not mean we can skip MS_REMOUNT (in fact, missing flags are how you indicate a flag needs to be cleared with mount(2)). The original purpose of the patch was to fix the userns issue, but as mentioned above the correct mechanism is to do a fallback mount that copies the lockable flags from statfs(2). * Improve handling of atime in the fallback case by: - Correctly handling the returned flags in statfs(2). - Implement the MNT_LOCK_ATIME checks in our code to ensure we produce errors rather than silently producing incorrect atime mounts. * Improve the tests so we correctly detect all of these contingencies, including a general "bind-mount atime handling" test to ensure that the behaviour described here is accurate. This change also inlines the remount() function -- it was only ever used for the bind-mount remount case, and its behaviour is very bind-mount specific. [1]: https://github.com/util-linux/util-linux/issues/2433 Reverts: `97f5ee4e6a` ("Only remount if requested flags differ from current") Fixes: `50105de1d8` ("Fix failure with rw bind mount of a ro fuse") Fixes: `da780e4d27` ("Fix bind mounts of filesystems with certain options set") Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2023-10-24 17:28:25 +11:00
Ruediger Pluem	da780e4d27	Fix bind mounts of filesystems with certain options set Currently bind mounts of filesystems with nodev, nosuid, noexec, noatime, relatime, strictatime, nodiratime options set fail in rootless mode if the same options are not set for the bind mount. For ro filesystems this was resolved by #2570 by remounting again with ro set. Follow the same approach for nodev, nosuid, noexec, noatime, relatime, strictatime, nodiratime but allow to revert back to the old behaviour via the new `--no-mount-fallback` command line option. Add a testcase to verify that bind mounts of filesystems with nodev, nosuid, noexec, noatime options set work in rootless mode. Add a testcase that mounts a nodev, nosuid, noexec, noatime filesystem with a ro flag. Add two further testcases that ensure that the above testcases would fail if the `--no-mount-fallback` command line option is set. * contrib/completions/bash/runc: Add `--no-mount-fallback` command line option for bash completion. * create.go: Add `--no-mount-fallback` command line option. * restore.go: Add `--no-mount-fallback` command line option. * run.go: Add `--no-mount-fallback` command line option. * libcontainer/configs/config.go: Add `NoMountFallback` field to the `Config` struct to store the command line option value. * libcontainer/specconv/spec_linux.go: Add `NoMountFallback` field to the `CreateOpts` struct to store the command line option value and store it in the libcontainer config. * utils_linux.go: Store the command line option value in the `CreateOpts` struct. * libcontainer/rootfs_linux.go: In case that `--no-mount-fallback` is not set try to remount the bind filesystem again with the options nodev, nosuid, noexec, noatime, relatime, strictatime or nodiratime if they are set on the source filesystem. * tests/integration/mounts_sshfs.bats: Add testcases and rework sshfs setup to allow specifying different mount options depending on the test case. Signed-off-by: Ruediger Pluem <ruediger.pluem@vodafone.com>	2023-07-28 16:32:02 -07:00
Kir Kolyshkin	d974b22ac4	create, run: amend final errors As the error may contain anything, it may not be clear to a user that the whole (create or run) operation failed. Amend the errors. Also, change the code flow in create to match that of run, so we don't have to add the fake "return nil" at the end. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-09-14 10:53:11 -07:00
Kir Kolyshkin	9ba2f65d6b	startContainer: minor refactor All three callers* of startContainer call revisePidFile and createSpec before calling it, so it makes sense to move those calls to inside of the startContainer, and drop the spec argument. * -- in fact restore does not call revisePidFile, but it should. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-09-14 10:53:11 -07:00
Akihiro Suda	5fb9b2a006	Merge pull request #3185 from kolyshkin/go117-build-tags Add go:build tags	2021-09-02 13:35:33 +09:00
Kir Kolyshkin	c5b0be78e8	Rm build tags from main pkg This was added by commit `5aa82c950` back in the day when we thought runc is going to be cross-platform. It's very clear now it's Linux-only package. While at it, further clarify it in README that we're Linux only. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-08-30 20:15:01 -07:00
lifubang	cb824629ba	proposal: add --keep to runc run Signed-off-by: lifubang <lifubang@acmcoder.com>	2021-08-02 12:51:36 -07:00
Andrei Vagin	a4fcbfb704	Prepare startContainer() to have more action Currently startContainer() is used to create and to run a container. In the next patch it will be used to restore a container. Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-05-01 21:55:57 +03:00
Ian Campbell	f5adb05bce	Add --preserve-fds=N to create and run This preserves the given number of file descriptors on top of the 3 stdio and the socket activation ($LISTEN_FDS=M) fds. If LISTEN_FDS is not set then [3..3+N) would be preserved by --preserve-fds=N. Given LISTEN_FDS=3 and --preserve-fds=5 then we would preserve fds [3, 11) (in addition to stdio). That's 3, 4 & 5 from LISTEN_FDS=3 and 6, 7, 8, 9 & 10 from --preserve-fds=5. Signed-off-by: Ian Campbell <ian.campbell@docker.com>	2017-02-20 11:50:18 +00:00
Aleksa Sarai	c6d8a2f26f	merge branch 'pr-1158' Closes #1158 LGTMs: @hqhq @cyphar	2016-12-26 13:59:47 +11:00
Aleksa Sarai	7df64f8886	runc: implement --console-socket This allows for higher-level orchestrators to be able to have access to the master pty file descriptor without keeping the runC process running. This is key to having (detach && createTTY) with a _real_ pty created inside the container, which is then sent to a higher level orchestrator over an AF_UNIX socket. This patch is part of the console rewrite patchset. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2016-12-01 15:49:36 +11:00
Aleksa Sarai	244c9fc426	*: console rewrite This implements {createTTY, detach} and all of the combinations and negations of the two that were previously implemented. There are some valid questions about out-of-OCI-scope topics like !createTTY and how things should be handled (why do we dup the current stdio to the process, and how is that not a security issue). However, these will be dealt with in a separate patchset. In order to allow for late console setup, split setupRootfs into the "preparation" section where all of the mounts are created and the "finalize" section where we pivot_root and set things as ro. In between the two we can set up all of the console mountpoints and symlinks we need. We use two-stage synchronisation to ensures that when the syscalls are reordered in a suboptimal way, an out-of-place read() on the parentPipe will not gobble the ancilliary information. This patch is part of the console rewrite patchset. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2016-12-01 15:49:36 +11:00
Zhang Wei	b517076907	Check args numbers before application start Add a general args number validator for all client commands. Signed-off-by: Zhang Wei <zhangwei555@huawei.com>	2016-11-29 11:18:51 +08:00
Wang Long	8676c75442	Fix the pid-file option for runc run/exec/create command Signed-off-by: Wang Long <long.wanglong@huawei.com>	2016-11-02 14:08:32 +08:00
Aleksa Sarai	0636bdd45b	Merge pull request #874 from crosbymichael/keyring Add option to disable new session keys	2016-06-12 21:44:45 +10:00
Mrunal Patel	a753b06645	Replace github.com/codegangsta/cli by github.com/urfave/cli The package got moved to a different repository Signed-off-by: Mrunal Patel <mrunalp@gmail.com>	2016-06-06 11:47:20 -07:00
Michael Crosby	8c9db3a7a5	Add option to disable new session keys This adds an `--no-new-keyring` flag to run and create so that a new session keyring is not created for the container and the calling processes keyring is inherited. Fixes #818 Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-06-03 11:53:07 -07:00
Michael Crosby	6eba9b8ffb	Fix SystemError and env lookup Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-05-31 11:10:47 -07:00
Michael Crosby	efcd73fb5b	Fix signal handling for unit tests Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-05-31 11:10:47 -07:00
Michael Crosby	3fe7d7f31e	Add create and start command for container lifecycle Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-05-31 11:06:41 -07:00
Michael Crosby	75fb70be01	Rename start to run `runc run` is the command that will create and start a container in one single command. Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-05-31 11:06:41 -07:00
Doug Davis	714ae2acc9	Add a 'start' command When any non-global-flag parameter appears on the command line make sure there's a "command" even in the 'start' (run) case to ensure its not ambiguous as to what the arg is. For example, w/o this fix its not clear if runc foo means 'foo' is the name of a config file or an unknown command. Or worse, you can't name a config file the same a ANY command, even future (yet to be created) commands. We should fix this now before we ship 1.0 and are forced to support this ambiguous case for a long time. Signed-off-by: Doug Davis <dug@us.ibm.com>	2015-08-21 15:26:34 -07:00
Fabio Kung	85f40c2bc7	container id is the cgroup name Without this, multiple runc containers can accidentally share the same cgroup(s) (and change each other's limits), when runc is invoked from the same directory (i.e.: same cwd on multiple runc executions). After these changes, each runc container will run on its own cgroup(s). Before, the only workaround was to invoke runc from an unique (temporary?) cwd for each container. Common cgroup configuration (and hierarchical limits) can be set by having multiple runc containers share the same cgroup parent, which is the cgroup of the process executing runc. Signed-off-by: Fabio Kung <fabio.kung@gmail.com>	2015-08-10 16:41:39 -07:00
Shishir Mahajan	27bfd1e2d1	systemd integration with container runtime for supporting sd_notify protocol Signed-off-by: Shishir Mahajan <shishir.mahajan@redhat.com>	2015-07-22 15:53:26 -04:00
Jin-Hwan Jeong	cbee9e5050	wrong grammar: should never been --> should have never been	2015-07-08 16:55:23 +09:00
Michael Crosby	f4c35e70d1	Depend on Spec types from specs repository Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2015-07-02 13:45:27 -07:00
Marianna	5aa82c950d	Enable build on unsupported platforms Should compile now without errors but changes needed to be added for each system so it actually works. main_unsupported.go is a new file with all the unsupported commands Fixes #9 Signed-off-by: Marianna <mtesselh@gmail.com>	2015-06-29 17:03:44 -07:00
Michael Crosby	b2d9d99610	Only define a single process This removes the Processes slice and only allows for one process of the container. It also renames TTY to Terminal for a cross platform meaning. Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2015-06-29 13:30:35 -07:00
Michael Crosby	1fa65466ea	Move linux specific options to subsection This moves the linux specific options into a "linux" {} section on the config. Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2015-06-29 13:30:35 -07:00
Phil Estes	9c56596f24	Make startup errors a bit friendlier A couple minor changes to error handling in startup: 1. Don't dump full help/usage text when the only problem is `runc` wasn't started under root privileges 2. Check for rootfs and make error clear to user when it doesn't exist 3. Change fatal to logrus.Fatal to get nicer output with simple error message Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com> (github: estesp)	2015-06-25 00:19:44 -07:00
Michael Crosby	9fac183294	Initial commit of runc binary Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2015-06-21 19:34:13 -07:00

33 Commits