zishuo/runc

mirror of https://github.com/opencontainers/runc.git synced 2025-10-05 23:46:57 +08:00

Author	SHA1	Message	Date
Kir Kolyshkin	6a3fe1618f	libcontainer: remove LinuxFactory Since LinuxFactory has become the means to specify containers state top directory (aka --root), and is only used by two methods (Create and Load), it is easier to pass root to them directly. Modify all the users and the docs accordingly. While at it, fix Create and Load docs (those that were originally moved from the Factory interface docs). Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2022-03-22 23:44:31 -07:00
Kir Kolyshkin	40b0088681	loadFactory: remove Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2022-02-18 16:05:29 -08:00
Kir Kolyshkin	36786c361a	list, utils: remove redundant code The value of root is already an absolute path since commit `ede8a86ec1`, so it does not make sense to call filepath.Abs() again. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2022-02-18 16:05:29 -08:00
Kir Kolyshkin	dbd990d555	libct: rm intelrtd.Manager interface, NewIntelRdtManager Remove intelrtd.Manager interface, since we only have a single implementation, and do not expect another one. Rename intelRdtManager to Manager, and modify its users accordingly. Remove NewIntelRdtManager from factory. Remove IntelRdtfs. Instead, make intelrdt.NewManager return nil if the feature is not available. Remove TestFactoryNewIntelRdt as it is now identical to TestFactoryNew. Add internal function newManager to be used for tests (to make sure some testing is done even when the feature is not available in kernel/hardware). Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2022-02-03 17:33:03 -08:00
Kir Kolyshkin	39bd7b7217	libct: Container, Factory: rm newuidmap/newgidmap These were introduced in commit `d8b669400` back in 2017, with a TODO of "make binary names configurable". Apparently, everyone is happy with the hardcoded names. In fact, they are configurable (by prepending the PATH with a directory containing own version of newuidmap/newgidmap). Now, these binaries are only needed in a few specific cases (when rootless is set etc.), so let's look them up only when needed. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2022-02-03 11:40:29 -08:00
Kir Kolyshkin	6e1d476aad	runc: remove --criu option This was introduced in an initial commit, back in the day when criu was a highly experimental thing. Today it's not; most users who need it have it packaged by their distro vendor. The usual way to run a binary is to look it up in directories listed in $PATH. This is flexible enough and allows for multiple scenarios (custom binaries, extra binaries, etc.). This is the way criu should be run. Make --criu a hidden option (thus removing it from help). Remove the option from man pages, integration tests, etc. Remove all traces of CriuPath from data structures. Add a warning that --criu is ignored and will be removed. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2022-01-26 20:25:56 -08:00
Kir Kolyshkin	86733013cc	notify_socket: setupSpec: drop ctx arg and return value Those were never used (ctx was added by the initial commit, and error was added by commit `25fd4a6757`). Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-11-29 20:10:22 -08:00
Kir Kolyshkin	3648346572	tty: ClosePostStart: rm return value It is not and was not ever used. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-11-29 20:10:22 -08:00
Kir Kolyshkin	f3f4b6d155	tty: recvtty: rm process arg It is not used since commit `00a0ecf554`. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-11-29 20:10:22 -08:00
Kir Kolyshkin	e63186351b	tty: rm inheritStdio return value Since commit `eebdb644f9` this function never returns any error. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-11-29 20:10:22 -08:00
Kir Kolyshkin	d23b810927	checkpoint: rm getDefaultImagePath arg It was never used. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-11-29 20:10:22 -08:00
Kir Kolyshkin	0202c398ff	runc exec: implement --cgroup In some setups, multiple cgroups are used inside a container, and sometime there is a need to execute a process in a particular sub-cgroup (in case of cgroup v1, for a particular controller). This is what this commit implements. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-09-27 10:25:42 -07:00
Kir Kolyshkin	097c6d7425	libct/cg: simplify getting cgroup manager 1. Make Rootless and Systemd flags part of config.Cgroups. 2. Make all cgroup managers (not just fs2) return error (so it can do more initialization -- added by the following commits). 3. Replace complicated cgroup manager instantiation in factory_linux by a single (and simple) libcontainer/cgroups/manager.New() function. 4. getUnifiedPath is simplified to check that only a single path is supplied (rather than checking that other paths, if supplied, are the same). [v2: can't -> cannot] Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-09-23 09:11:44 -07:00
Kir Kolyshkin	9ba2f65d6b	startContainer: minor refactor All three callers* of startContainer call revisePidFile and createSpec before calling it, so it makes sense to move those calls to inside of the startContainer, and drop the spec argument. * -- in fact restore does not call revisePidFile, but it should. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-09-14 10:53:11 -07:00
Kir Kolyshkin	6c4a3b13d1	runc init: pass _LIBCONTAINER_LOGLEVEL as int Instead of passing _LIBCONTAINER_LOGLEVEL as a string (like "debug" or "info"), use a numeric value. Also, simplify the init log level passing code -- since we actually use the same level as the runc binary, just get it from logrus. This is a preparation for the next commit. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-09-09 14:57:20 -07:00
Kir Kolyshkin	0a3577c680	utils_linux: simplify newProcess newProcess do not need those extra arguments, they can be handled in the caller. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-09-09 14:57:20 -07:00
Akihiro Suda	5fb9b2a006	Merge pull request #3185 from kolyshkin/go117-build-tags Add go:build tags	2021-09-02 13:35:33 +09:00
Kir Kolyshkin	9ff64c3d97	*: rm redundant linux build tag For files that end with _linux.go or _linux_test.go, there is no need to specify linux build tag, as it is assumed from the file name. In addition, rename libcontainer/notify_linux_v2.go -> libcontainer/notify_v2_linux.go for the file name to make sense. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-08-30 20:15:00 -07:00
lifubang	cb824629ba	proposal: add --keep to runc run Signed-off-by: lifubang <lifubang@acmcoder.com>	2021-08-02 12:51:36 -07:00
Kir Kolyshkin	a7cfb23b88	*: stop using pkg/errors Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-06-22 16:09:47 -07:00
Kir Kolyshkin	e6048715e4	Use gofumpt to format code gofumpt (mvdan.cc/gofumpt) is a fork of gofmt with stricter rules. Brought to you by git ls-files \*.go \| grep -v ^vendor/ \| xargs gofumpt -s -w Looking at the diff, all these changes make sense. Also, replace gofmt with gofumpt in golangci.yml. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-06-01 12:17:27 -07:00
Kir Kolyshkin	719d70d2e3	setupIO: simplify code There's no need to check err for nil. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-01-07 13:33:41 -08:00
Xiaochen Shen	325a74ddec	libcontainer/intelrdt: rm init() from intelrdt.go Use sync.Once to init Intel RDT when needed for a small speedup to operations which do not require Intel RDT. Simplify IntelRdtManager initialization in LinuxFactory. Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>	2020-12-16 23:37:31 +08:00
Xiaochen Shen	f62ad4a0de	libcontainer/intelrdt: rename CAT and MBA enabled flags Rename CAT and MBA enabled flags to be consistent with others. No functional change. Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>	2020-11-10 15:32:01 +08:00
Sebastiaan van Stijn	8bf216728c	use string-concatenation instead of sprintf for simple cases Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2020-09-30 10:51:59 +02:00
Kir Kolyshkin	b79cb04886	runc run/exec: fix terminal wrt stdin redirection This fixes the following failure: > sudo runc run -b bundle ctr </dev/null > WARN[0000] exit status 2 > ERRO[0000] container_linux.go:367: starting container process caused: process_linux.go:459: container init caused: The "exit status 2" with no error message is caused by SIGHUP which is sent to init by the kernel when we are losing the controlling terminal. If we choose to ignore that, we'll get panic in console.Current(), which is addressed by [1]. Otherwise, the issue here is simple: the code assumes stdin is opened to a terminal, and fails to work otherwise. Some standard Linux tools (e.g. stty, top) do the same (modulo panic), while some others (reset, tput) use the trick of trying all the three std streams (starting with stderr as it is least likely to be redirected), and if all three fails, open /dev/tty. This commit does a similar thing (see initHostConsole). It also replaces the call to console.Current(), which may panic (see [1]), by reusing the t.hostConsole. Finally, a simple test case is added. Fixes: https://github.com/opencontainers/runc/issues/2485 [1] https://github.com/containerd/console/pull/37 Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-08-20 07:56:20 -07:00
zvier	92e2175de1	cleancode: clean code for utils_linux.go Signed-off-by: Jeff Zvier <zvier20@gmail.com>	2020-07-23 06:12:27 +08:00
John Hwang	5aa0601a59	validateProcessSpec: prevent SEGV when config is valid json, but invalid. Signed-off-by: John Hwang <John.F.Hwang@gmail.com>	2020-05-18 09:38:22 -07:00
John Hwang	7fc291fd45	Replace formatted errors when unneeded Signed-off-by: John Hwang <John.F.Hwang@gmail.com>	2020-05-16 18:13:21 -07:00
Kir Kolyshkin	2b31437caa	Merge pull request #2281 from AkihiroSuda/rootless-systemd cgroup v2: support rootless systemd LGTMs: kolyshkin, mrunalp	2020-05-07 21:45:52 -07:00
Akihiro Suda	bf15cc99b1	cgroup v2: support rootless systemd Tested with both Podman (master) and Moby (master), on Ubuntu 19.10 . $ podman --cgroup-manager=systemd run -it --rm --runtime=runc \ --cgroupns=host --memory 42m --cpus 0.42 --pids-limit 42 alpine / # cat /proc/self/cgroup 0::/user.slice/user-1001.slice/user@1001.service/user.slice/libpod-132ff0d72245e6f13a3bbc6cdc5376886897b60ac59eaa8dea1df7ab959cbf1c.scope / # cat /sys/fs/cgroup/user.slice/user-1001.slice/user@1001.service/user.slice/libpod-132ff0d72245e6f13a3bbc6cdc5376886897b60ac59eaa8dea1df7ab959cbf1c.scope/memory.max 44040192 / # cat /sys/fs/cgroup/user.slice/user-1001.slice/user@1001.service/user.slice/libpod-132ff0d72245e6f13a3bbc6cdc5376886897b60ac59eaa8dea1df7ab959cbf1c.scope/cpu.max 42000 100000 / # cat /sys/fs/cgroup/user.slice/user-1001.slice/user@1001.service/user.slice/libpod-132ff0d72245e6f13a3bbc6cdc5376886897b60ac59eaa8dea1df7ab959cbf1c.scope/pids.max 42 Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2020-05-08 12:39:20 +09:00
Kir Kolyshkin	c52a598d74	Remove fatalf() It was only used in one place, all others are happy with `fatal(fmt.Errorf())`. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-05-02 16:19:14 -07:00
Mrunal Patel	33c6125da6	systemd: Export IsSystemdRunning() function Signed-off-by: Mrunal Patel <mrunalp@gmail.com>	2020-03-30 15:24:06 -07:00
Akihiro Suda	cc183ca662	Merge pull request #2242 from AkihiroSuda/vendor-systemd vendor: update go-systemd and godbus	2020-03-25 02:40:22 +09:00
Ted Yu	0a7762c664	Avoid duplicate calls to runner#destroy Signed-off-by: Ted Yu <yuzhihong@gmail.com>	2020-03-23 09:04:38 -07:00
Akihiro Suda	492d525e55	vendor: update go-systemd and godbus Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2020-03-16 13:26:03 +09:00
Giuseppe Scrivano	25fd4a6757	sd-notify: do not hang when NOTIFY_SOCKET is used with create if NOTIFY_SOCKET is used, do not block the main runc process waiting for events on the notify socket. Bind mount the parent directory of the notify socket, so that "start" can create the socket and it is still accessible from the container. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2020-03-12 21:21:05 +01:00
Mrunal Patel	eb4aeed24f	Merge pull request #2038 from imxyb/defer-destroy `r.destroy` can defer exec in `runner.run` method.	2019-05-07 15:48:14 -07:00
Georgi Sabev	ba3cabf932	Improve nsexec logging * Simplify logging function * Logs contain __FUNCTION__:__LINE__ * Bail uses write_log Co-authored-by: Julia Nedialkova <julianedialkova@hotmail.com> Co-authored-by: Danail Branekov <danailster@gmail.com> Signed-off-by: Georgi Sabev <georgethebeatle@gmail.com>	2019-04-22 17:53:52 +03:00
Xiao YongBiao	da5a2dd456	`r.destroy` can defer exec in `runner.run` method. Signed-off-by: Xiao YongBiao <xyb4638@gmail.com>	2019-04-10 23:25:03 +08:00
lifubang	3e6688f5c9	add selinux label for runc exec Signed-off-by: lifubang <lifubang@acmcoder.com>	2019-04-03 12:09:06 +08:00
lifubang	7cb3cde1f4	fix preserve-fds flag may cause runc hang Signed-off-by: lifubang <lifubang@acmcoder.com>	2019-03-01 17:15:17 +08:00
Xiaochen Shen	27560ace2f	libcontainer: intelrdt: add support for Intel RDT/MBA in runc Memory Bandwidth Allocation (MBA) is a resource allocation sub-feature of Intel Resource Director Technology (RDT) which is supported on some Intel Xeon platforms. Intel RDT/MBA provides indirect and approximate throttle over memory bandwidth for the software. A user controls the resource by indicating the percentage of maximum memory bandwidth. Hardware details of Intel RDT/MBA can be found in section 17.18 of Intel Software Developer Manual: https://software.intel.com/en-us/articles/intel-sdm In Linux 4.12 kernel and newer, Intel RDT/MBA is enabled by kernel config CONFIG_INTEL_RDT. If hardware support, CPU flags `rdt_a` and `mba` will be set in /proc/cpuinfo. Intel RDT "resource control" filesystem hierarchy: mount -t resctrl resctrl /sys/fs/resctrl tree /sys/fs/resctrl /sys/fs/resctrl/ \|-- info \| \|-- L3 \| \| \|-- cbm_mask \| \| \|-- min_cbm_bits \| \| \|-- num_closids \| \|-- MB \| \|-- bandwidth_gran \| \|-- delay_linear \| \|-- min_bandwidth \| \|-- num_closids \|-- ... \|-- schemata \|-- tasks \|-- <container_id> \|-- ... \|-- schemata \|-- tasks For MBA support for `runc`, we will reuse the infrastructure and code base of Intel RDT/CAT which implemented in #1279. We could also make use of `tasks` and `schemata` configuration for memory bandwidth resource constraints. The file `tasks` has a list of tasks that belongs to this group (e.g., <container_id>" group). Tasks can be added to a group by writing the task ID to the "tasks" file (which will automatically remove them from the previous group to which they belonged). New tasks created by fork(2) and clone(2) are added to the same group as their parent. The file `schemata` has a list of all the resources available to this group. Each resource (L3 cache, memory bandwidth) has its own line and format. Memory bandwidth schema: It has allocation values for memory bandwidth on each socket, which contains L3 cache id and memory bandwidth percentage. Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..." The minimum bandwidth percentage value for each CPU model is predefined and can be looked up through "info/MB/min_bandwidth". The bandwidth granularity that is allocated is also dependent on the CPU model and can be looked up at "info/MB/bandwidth_gran". The available bandwidth control steps are: min_bw + N * bw_gran. Intermediate values are rounded to the next control step available on the hardware. For more information about Intel RDT kernel interface: https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt An example for runc: Consider a two-socket machine with two L3 caches where the minimum memory bandwidth of 10% with a memory bandwidth granularity of 10%. Tasks inside the container may use a maximum memory bandwidth of 20% on socket 0 and 70% on socket 1. "linux": { "intelRdt": { "memBwSchema": "MB:0=20;1=70" } } Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>	2018-10-16 14:29:29 +08:00
Akihiro Suda	06f789cf26	Disable rootless mode except RootlessCgMgr when executed as the root in userns This PR decomposes `libcontainer/configs.Config.Rootless bool` into `RootlessEUID bool` and `RootlessCgroups bool`, so as to make "runc-in-userns" to be more compatible with "rootful" runc. `RootlessEUID` denotes that runc is being executed as a non-root user (euid != 0) in the current user namespace. `RootlessEUID` is almost identical to the former `Rootless` except cgroups stuff. `RootlessCgroups` denotes that runc is unlikely to have the full access to cgroups. `RootlessCgroups` is set to false if runc is executed as the root (euid == 0) in the initial namespace. Otherwise `RootlessCgroups` is set to true. (Hint: if `RootlessEUID` is true, `RootlessCgroups` becomes true as well) When runc is executed as the root (euid == 0) in an user namespace (e.g. by Docker-in-LXD, Podman, Usernetes), `RootlessEUID` is set to false but `RootlessCgroups` is set to true. So, "runc-in-userns" behaves almost same as "rootful" runc except that cgroups errors are ignored. This PR does not have any impact on CLI flags and `state.json`. Note about CLI: * Now `runc --rootless=(auto\|true\|false)` CLI flag is only used for setting `RootlessCgroups`. * Now `runc spec --rootless` is only required when `RootlessEUID` is set to true. For runc-in-userns, `runc spec` without `--rootless` should work, when sufficient numbers of UID/GID are mapped. Note about `$XDG_RUNTIME_DIR` (e.g. `/run/user/1000`): * `$XDG_RUNTIME_DIR` is ignored if runc is being executed as the root (euid == 0) in the initial namespace, for backward compatibility. (`/run/runc` is used) * If runc is executed as the root (euid == 0) in an user namespace, `$XDG_RUNTIME_DIR` is honored if `$USER != "" && $USER != "root"`. This allows unprivileged users to allow execute runc as the root in userns, without mounting writable `/run/runc`. Note about `state.json`: * `rootless` is set to true when `RootlessEUID == true && RootlessCgroups == true`. Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>	2018-09-07 15:05:03 +09:00
Mrunal Patel	bd3c4f844a	Fix race in runc exec There is a race in runc exec when the init process stops just before the check for the container status. It is then wrongly assumed that we are trying to start an init process instead of an exec process. This commit add an Init field to libcontainer Process to distinguish between init and exec processes to prevent this race. Signed-off-by: Mrunal Patel <mrunalp@gmail.com>	2018-06-01 16:25:58 -07:00
Akihiro Suda	63bb0fe9d0	Fix merge conflict Caused by: * #1688 `0e561642f8` * #1759 `dd67ab10d7` Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>	2018-05-30 11:25:43 +09:00
Michael Crosby	0e561642f8	Merge pull request #1688 from AkihiroSuda/unshare-m-r main: support rootless mode in userns	2018-05-29 15:41:17 -04:00
Akihiro Suda	cdb7f23d21	main: add condition to isRootless() Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>	2018-05-10 12:17:37 +09:00
Akihiro Suda	f103de57ec	main: support rootless mode in userns Running rootless containers in userns is useful for mounting filesystems (e.g. overlay) with mapped euid 0, but without actual root privilege. Usage: (Note that `unshare --mount` requires `--map-root-user`) user$ mkdir lower upper work rootfs user$ curl http://dl-cdn.alpinelinux.org/alpine/v3.7/releases/x86_64/alpine-minirootfs-3.7.0-x86_64.tar.gz \| tar Cxz ./lower \|\| ( true; echo "mknod errors were ignored" ) user$ unshare --mount --map-root-user mappedroot# runc spec --rootless mappedroot# sed -i 's/"readonly": true/"readonly": false/g' config.json mappedroot# mount -t overlay -o lowerdir=./lower,upperdir=./upper,workdir=./work overlayfs ./rootfs mappedroot# runc run foo Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>	2018-05-10 12:16:43 +09:00
Aleksa Sarai	03e585985f	rootless: cgroup: treat EROFS as a skippable error In some cases, /sys/fs/cgroups is mounted read-only. In rootless containers we can consider this effectively identical to having cgroups that we don't have write permission to -- because the user isn't responsible for the read-only setup and cannot modify it. The rules are identical to when /sys/fs/cgroups is not writable by the unprivileged user. An example of this is the default configuration of Docker, where cgroups are mounted as read-only as a preventative security measure. Reported-by: Vladimir Rutsky <rutsky@google.com> Signed-off-by: Aleksa Sarai <asarai@suse.de>	2018-03-17 13:53:42 +11:00

1 2

90 Commits