zishuo/runc

mirror of https://github.com/opencontainers/runc.git synced 2025-10-20 14:15:34 +08:00

Author	SHA1	Message	Date
Kir Kolyshkin	ae477f15f0	libct/configs: move cgroup stuff to libct/cgroups We have quite a few external users of libcontainer/cgroups packages, and they all have to depend on libcontainer/configs as well. Let's move cgroup-related configuration to libcontainer/croups. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2024-12-11 19:08:40 -08:00
Kir Kolyshkin	30f8f51eab	runc create/run: warn on rootless + shared pidns + no cgroup Shared pid namespace means `runc kill` (or `runc delete -f`) have to kill all container processes, not just init. To do so, it needs a cgroup to read the PIDs from. If there is no cgroup, processes will be leaked, and so such configuration is bad and should not be allowed. To keep backward compatibility, though, let's merely warn about this for now. Alas, the only way to know if cgroup access is available is by returning an error from Manager.Apply. Amend fs cgroup managers to do so (systemd doesn't need it, since v1 can't work with rootless, and cgroup v2 does not have a special rootless case). Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2024-09-17 22:49:37 -07:00
Kir Kolyshkin	1c505fffdc	Revert "Set temporary single CPU affinity..." There's too much logic here figuring out which CPUs to use. Runc is a low level tool and is not supposed to be that "smart". What's worse, this logic is executed on every exec, making it slower. Some of the logic in (*setnsProcess).start is executed even if no annotation is set, thus making ALL execs slow. Also, this should be a property of a process, rather than annotation. The plan is to rework this. This reverts commit `afc23e3397`. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2024-06-10 06:31:03 +08:00
Kir Kolyshkin	4f3319b56d	libct: decouple libct/cg/devices Commit `b6967fa84c` moved the functionality of managing cgroup devices into a separate package, and decoupled libcontainer/cgroups from it. Yet, some software (e.g. cadvisor) may need to use libcontainer package, which imports libcontainer/cgroups/devices, thus making it impossible to use libcontainer without bringing in cgroup/devices dependency. In fact, we only need to manage devices in runc binary, so move the import to main.go. The need to import libct/cg/dev in order to manage devices is already documented in libcontainer/cgroups, but let's - update that documentation; - add a similar note to libcontainer/cgroups/systemd; - add a note to libct README. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2024-04-17 15:05:38 -07:00
Cédric Clerget	afc23e3397	Set temporary single CPU affinity before cgroup cpuset transition. This handles a corner case when joining a container having all the processes running exclusively on isolated CPU cores to force the kernel to schedule runc process on the first CPU core within the cgroups cpuset. The introduction of the kernel commit 46a87b3851f0d6eb05e6d83d5c5a30df0eca8f76 has affected this deterministic scheduling behavior by distributing tasks across CPU cores within the cgroups cpuset. Some intensive real-time application are relying on this deterministic behavior and use the first CPU core to run a slow thread while other CPU cores are fully used by real-time threads with SCHED_FIFO policy. Such applications prevents runc process from joining a container when the runc process is randomly scheduled on a CPU core owned by a real-time thread. Introduces isolated CPU affinity transition OCI runtime annotation org.opencontainers.runc.exec.isolated-cpu-affinity-transition to restore the behavior during runc exec. Fix issue with kernel >= 6.2 not resetting CPU affinity for container processes. Signed-off-by: Cédric Clerget <cedric.clerget@gmail.com>	2024-04-16 08:59:49 +02:00
Kir Kolyshkin	b6967fa84c	Decouple cgroup devices handling This commit separates the functionality of setting cgroup device rules out of libct/cgroups to libct/cgroups/devices package. This package, if imported, sets the function variables in libct/cgroups and libct/cgroups/systemd, so that a cgroup manager can use those to manage devices. If those function variables are nil (when libct/cgroups/devices are not imported), a cgroup manager returns the ErrDevicesUnsupported in case any device rules are set in Resources. It also consolidates the code from libct/cgroups/ebpf and libct/cgroups/ebpf/devicefilter into libct/cgroups/devices. Moved some tests in libct/cg/sd that require device management to libct/sd/devices. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2022-05-18 11:17:08 -07:00
Kir Kolyshkin	dbb9fc03ae	libct/*: remove linux build tag from some pkgs Only some libcontainer packages can be built on non-linux platforms (not that it make sense, but at least go build succeeds). Let's call these "good" packages. For all other packages (i.e. ones that fail to build with GOOS other than linux), it does not make sense to have linux build tag (as they are broken already, and thus are not and can not be used on anything other than Linux). Remove linux build tag for all non-"good" packages. This was mostly done by the following script, with just a few manual fixes on top. function list_good_pkgs() { for pkg in $(find . -type d -print); do GOOS=freebsd go build $pkg 2>/dev/null \ && GOOS=solaris go build $pkg 2>/dev/null \ && echo $pkg done \| sed -e 's\|^./\|\|' \| tr '\n' '\|' \| sed -e 's/\|$//' } function remove_tag() { sed -i -e '\\|^// +build linux$\|d' $1 go fmt $1 } SKIP="^("$(list_good_pkgs)")" for f in $(git ls-files . \| grep .go$); do if echo $f \| grep -qE "$SKIP"; then echo skip $f continue fi echo proc $f remove_tag $f done Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-08-30 20:52:07 -07:00
Kir Kolyshkin	abf12ce0db	libc/cg: improve Manager docs Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-04-29 15:30:12 -07:00
Kir Kolyshkin	3f65946756	libct/cg: make Set accept configs.Resources A cgroup manager's Set method sets cgroup resources, but historically it was accepting configs.Cgroups. Refactor it to accept resources only. This is an improvement from the API point of view, as the method can not change cgroup configuration (such as path to the cgroup etc), it can only set (modify) its resources/limits. This also lays the foundation for complicated resource updates, as now Set has two sets of resources -- the one that was previously specified during cgroup manager creation (or the previous Set), and the one passed in the argument, so it could deduce the difference between these. This is a long term goal though. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-04-29 15:24:19 -07:00
Kir Kolyshkin	5d0ffbf9c8	runc start/run: report OOM In some cases, container init fails to start because it is killed by the kernel OOM killer. The errors returned by runc in such cases are semi-random and rather cryptic. Below are a few examples. On cgroup v1 + systemd cgroup driver: > process_linux.go:348: copying bootstrap data to pipe caused: write init-p: broken pipe > process_linux.go:352: getting the final child's pid from pipe caused: EOF On cgroup v2: > process_linux.go:495: container init caused: read init-p: connection reset by peer > process_linux.go:484: writing syncT 'resume' caused: write init-p: broken pipe This commits adds the OOM method to cgroup managers, which tells whether the container was OOM-killed. In case that has happened, the original error is discarded (unless --debug is set), and the new OOM error is reported instead: > ERRO[0000] container_linux.go:367: starting container process caused: container init was OOM-killed (memory limit too low?) Also, fix the rootless test cases that are failing because they expect an error in the first line, and we have an additional warning now: > unable to get oom kill count" error="no directory specified for memory.oom_control Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-02-23 16:15:33 -08:00
Kir Kolyshkin	0681d456fc	libct/cgroups/utils: move cgroup v1 code to separate file In most project, "utils" is a big mess, and this is not an exception. Try to clean it up a bit by moving cgroup v1 specific code to a separate source file. There are no code changes in this commit, just moving it from one file to another. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-06-16 12:45:07 -07:00
lifubang	9087f2e827	fix path error in systemd when stopped When we use cgroup with systemd driver, the cgroup path will be auto removed by systemd when all processes exited. So we should check cgroup path exists when we access the cgroup path, for example in `kill/ps`, or else we will got an error. Signed-off-by: lifubang <lifubang@acmcoder.com>	2020-06-02 18:17:43 +08:00
Aleksa Sarai	859a780d6f	cgroups: add GetFreezerState() helper to Manager This is effectively a nicer implementation of the container.isPaused() helper, but to be used within the cgroup code for handling some fun issues we have to fix with the systemd cgroup driver. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2020-05-13 17:38:45 +10:00
Kir Kolyshkin	714c91e9f7	Simplify cgroup path handing in v2 via unified API This unties the Gordian Knot of using GetPaths in cgroupv2 code. The problem is, the current code uses GetPaths for three kinds of things: 1. Get all the paths to cgroup v1 controllers to save its state (see (linuxContainer).currentState(), (LinuxFactory).loadState() methods). 2. Get all the paths to cgroup v1 controllers to have the setns process enter the proper cgroups in `(*setnsProcess).start()`. 3. Get the path to a specific controller (for example, `m.GetPaths()["devices"]`). Now, for cgroup v2 instead of a set of per-controller paths, we have only one single unified path, and a dedicated function `GetUnifiedPath()` to get it. This discrepancy between v1 and v2 cgroupManager API leads to the following problems with the code: - multiple if/else code blocks that have to treat v1 and v2 separately; - backward-compatible GetPaths() methods in v2 controllers; - - repeated writing of the PID into the same cgroup for v2; Overall, it's hard to write the right code with all this, and the code that is written is kinda hard to follow. The solution is to slightly change the API to do the 3 things outlined above in the same manner for v1 and v2: 1. Use `GetPaths()` for state saving and setns process cgroups entering. 2. Introduce and use Path(subsys string) to obtain a path to a subsystem. For v2, the argument is ignored and the unified path is returned. This commit converts all the controllers to the new API, and modifies all the users to use it. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-05-08 12:04:06 -07:00
Julio Montes	8ddd892072	libcontainer: add method to get cgroup config from cgroup Manager `configs.Cgroup` contains the configuration used to create cgroups. This configuration must be saved to disk, since it's required to restore the cgroup manager that was used to create the cgroups. Add method to get cgroup configuration from cgroup Manager to allow API users save it to disk and restore a cgroup manager later. fixes #2176 Signed-off-by: Julio Montes <julio.montes@intel.com>	2019-12-17 22:46:03 +00:00
Akihiro Suda	dbd771e475	cgroup2: implement `runc ps` Implemented `runc ps` for cgroup v2 , using a newly added method `m.GetUnifiedPath()`. Unlike the v1 implementation that checks `m.GetPaths()["devices"]`, the v2 implementation does not require the device controller to be available. Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2019-10-19 01:59:24 +09:00
Wang Long	4dfd350a38	cgroups: update the comments Signed-off-by: Wang Long <long.wanglong@huawei.com>	2017-01-03 22:40:12 +08:00
Yuanhong Peng	a71a301a28	Fix typo. Signed-off-by: Yuanhong Peng <pengyuanhong@huawei.com>	2016-09-09 16:18:54 +08:00
allencloud	10cc27888c	fix typos Signed-off-by: allencloud <allen.sun@daocloud.io>	2016-03-25 11:11:48 +08:00
Jimmi Dyson	91c7024e52	Revert to non-recursive GetPids, add recursive GetAllPids Signed-off-by: Jimmi Dyson <jimmidyson@gmail.com>	2016-01-08 19:42:25 +00:00
Michael Crosby	080df7ab88	Update import paths for new repository Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2015-06-21 19:29:59 -07:00
Michael Crosby	8f97d39dd2	Move libcontainer into subdirectory Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2015-06-21 19:29:15 -07:00

22 Commits