zishuo/runc

mirror of https://github.com/opencontainers/runc.git synced 2025-09-27 03:46:19 +08:00

Author	SHA1	Message	Date
Kir Kolyshkin	37b5acc2d7	libct: use manager.AddPid to add exec to cgroup The main benefit here is when we are using a systemd cgroup driver, we actually ask systemd to add a PID, rather than doing it ourselves. This way, we can add rootless exec PID to a cgroup. This requires newer opencontainers/cgroups and coreos/go-systemd. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-09-16 13:31:16 -07:00
Kir Kolyshkin	a75076b4a4	Switch to opencontainers/cgroups This removes libcontainer/cgroups packages and starts using those from github.com/opencontainers/cgroups repo. Mostly generated by: git rm -f libcontainer/cgroups find . -type f -name "*.go" -exec sed -i \ 's\|github.com/opencontainers/runc/libcontainer/cgroups\|github.com/opencontainers/cgroups\|g' \ {} + go get github.com/opencontainers/cgroups@v0.0.1 make vendor gofumpt -w . Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-02-28 15:20:33 -08:00
Kir Kolyshkin	a56f85f87b	libct/*: switch from configs to cgroups Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2024-12-11 19:08:40 -08:00
Kir Kolyshkin	1c505fffdc	Revert "Set temporary single CPU affinity..." There's too much logic here figuring out which CPUs to use. Runc is a low level tool and is not supposed to be that "smart". What's worse, this logic is executed on every exec, making it slower. Some of the logic in (*setnsProcess).start is executed even if no annotation is set, thus making ALL execs slow. Also, this should be a property of a process, rather than annotation. The plan is to rework this. This reverts commit `afc23e3397`. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2024-06-10 06:31:03 +08:00
Cédric Clerget	afc23e3397	Set temporary single CPU affinity before cgroup cpuset transition. This handles a corner case when joining a container having all the processes running exclusively on isolated CPU cores to force the kernel to schedule runc process on the first CPU core within the cgroups cpuset. The introduction of the kernel commit 46a87b3851f0d6eb05e6d83d5c5a30df0eca8f76 has affected this deterministic scheduling behavior by distributing tasks across CPU cores within the cgroups cpuset. Some intensive real-time application are relying on this deterministic behavior and use the first CPU core to run a slow thread while other CPU cores are fully used by real-time threads with SCHED_FIFO policy. Such applications prevents runc process from joining a container when the runc process is randomly scheduled on a CPU core owned by a real-time thread. Introduces isolated CPU affinity transition OCI runtime annotation org.opencontainers.runc.exec.isolated-cpu-affinity-transition to restore the behavior during runc exec. Fix issue with kernel >= 6.2 not resetting CPU affinity for container processes. Signed-off-by: Cédric Clerget <cedric.clerget@gmail.com>	2024-04-16 08:59:49 +02:00
Kir Kolyshkin	efbebb39b5	libct: rename root to stateDir in struct Container The name "root" (or "containerRoot") is confusing; one might think it is the root of container's file system (the directory we chroot into). Rename to stateDir for clarity. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2023-10-04 14:57:10 +11:00
Kir Kolyshkin	102b8abd26	libct: rm BaseContainer and Container interfaces The only implementation of these is linuxContainer. It does not make sense to have an interface with a single implementation, and we do not foresee other types of containers being added to runc. Remove BaseContainer and Container interfaces, moving their methods documentation to linuxContainer. Rename linuxContainer to Container. Adopt users from using interface to using struct. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2022-03-23 11:04:12 -07:00
Kir Kolyshkin	85932850ec	libct: rm TestGetContainerStats, mockIntelRdtManager TestGetContainerStats test a function that is smaller than the test itself, and only calls a couple of other functions (which are represented by mocks). It does not make sense to have it. mockIntelRdtManager is only needed for TestGetContainerStats and TestGetContainerState, which basically tests that Path is called. Also, it does not make much sense to have it. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2022-02-03 17:33:03 -08:00
Kir Kolyshkin	9ff64c3d97	*: rm redundant linux build tag For files that end with _linux.go or _linux_test.go, there is no need to specify linux build tag, as it is assumed from the file name. In addition, rename libcontainer/notify_linux_v2.go -> libcontainer/notify_v2_linux.go for the file name to make sense. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-08-30 20:15:00 -07:00
Kir Kolyshkin	a91ce3062f	libct/*_test.go: use t.TempDir Replace ioutil.TempDir (mostly) with t.TempDir, which require no explicit cleanup. While at it, fix incorrect usage of os.ModePerm in libcontainer/intelrdt test. This is supposed to be a mask, not mode bits. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-07-27 01:41:47 -07:00
Kir Kolyshkin	e6048715e4	Use gofumpt to format code gofumpt (mvdan.cc/gofumpt) is a fork of gofmt with stricter rules. Brought to you by git ls-files \*.go \| grep -v ^vendor/ \| xargs gofumpt -s -w Looking at the diff, all these changes make sense. Also, replace gofmt with gofumpt in golangci.yml. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-06-01 12:17:27 -07:00
Kir Kolyshkin	3f65946756	libct/cg: make Set accept configs.Resources A cgroup manager's Set method sets cgroup resources, but historically it was accepting configs.Cgroups. Refactor it to accept resources only. This is an improvement from the API point of view, as the method can not change cgroup configuration (such as path to the cgroup etc), it can only set (modify) its resources/limits. This also lays the foundation for complicated resource updates, as now Set has two sets of resources -- the one that was previously specified during cgroup manager creation (or the previous Set), and the one passed in the argument, so it could deduce the difference between these. This is a long term goal though. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-04-29 15:24:19 -07:00
Kir Kolyshkin	201d60c51d	runc run/start/exec: fix init log forwarding race Sometimes debug.bats test cases are failing like this: > not ok 27 global --debug to --log --log-format 'json' > # (in test file tests/integration/debug.bats, line 77) > # `[[ "${output}" == "child process in init()" ]]' failed It happens more when writing to disk. This issue is caused by the fact that runc spawns log forwarding goroutine (ForwardLogs) but does not wait for it to finish, resulting in missing debug lines from nsexec. ForwardLogs itself, though, never finishes, because it reads from a reading side of a pipe which writing side is not closed. This is especially true in case of runc create, which spawns runc init and exits; meanwhile runc init waits on exec fifo for arbitrarily long time before doing execve. So, to fix the failure described above, we need to: 1. Make runc create/run/exec wait for ForwardLogs to finish; 2. Make runc init close its log pipe file descriptor (i.e. the one which value is passed in _LIBCONTAINER_LOGPIPE environment variable). This is exactly what this commit does: 1. Amend ForwardLogs to return a channel, and wait for it in start(). 2. In runc init, save the log fd and close it as late as possible. PS I have to admit I still do not understand why an explicit close of log pipe fd is required in e.g. (*linuxSetnsInit).Init, right before the execve which (thanks to CLOEXEC) closes the fd anyway. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-03-25 19:18:55 -07:00
Kir Kolyshkin	5d0ffbf9c8	runc start/run: report OOM In some cases, container init fails to start because it is killed by the kernel OOM killer. The errors returned by runc in such cases are semi-random and rather cryptic. Below are a few examples. On cgroup v1 + systemd cgroup driver: > process_linux.go:348: copying bootstrap data to pipe caused: write init-p: broken pipe > process_linux.go:352: getting the final child's pid from pipe caused: EOF On cgroup v2: > process_linux.go:495: container init caused: read init-p: connection reset by peer > process_linux.go:484: writing syncT 'resume' caused: write init-p: broken pipe This commits adds the OOM method to cgroup managers, which tells whether the container was OOM-killed. In case that has happened, the original error is discarded (unless --debug is set), and the new OOM error is reported instead: > ERRO[0000] container_linux.go:367: starting container process caused: container init was OOM-killed (memory limit too low?) Also, fix the rootless test cases that are failing because they expect an error in the first line, and we have an additional warning now: > unable to get oom kill count" error="no directory specified for memory.oom_control Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-02-23 16:15:33 -08:00
Xiaochen Shen	f62ad4a0de	libcontainer/intelrdt: rename CAT and MBA enabled flags Rename CAT and MBA enabled flags to be consistent with others. No functional change. Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>	2020-11-10 15:32:01 +08:00
Kir Kolyshkin	a77d7b1d0f	libct: don't use GetPaths Since commit `714c91e9f7`, method GetPaths() should only be used for saving container state. For other uses, we have a new method, Path(), which is cleaner. Fix GetPaths() usage introduced by recent commits `859a780d6f` and `9087f2e82`. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-06-15 18:27:34 -07:00
lifubang	9087f2e827	fix path error in systemd when stopped When we use cgroup with systemd driver, the cgroup path will be auto removed by systemd when all processes exited. So we should check cgroup path exists when we access the cgroup path, for example in `kill/ps`, or else we will got an error. Signed-off-by: lifubang <lifubang@acmcoder.com>	2020-06-02 18:17:43 +08:00
Aleksa Sarai	859a780d6f	cgroups: add GetFreezerState() helper to Manager This is effectively a nicer implementation of the container.isPaused() helper, but to be used within the cgroup code for handling some fun issues we have to fix with the systemd cgroup driver. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2020-05-13 17:38:45 +10:00
Kir Kolyshkin	714c91e9f7	Simplify cgroup path handing in v2 via unified API This unties the Gordian Knot of using GetPaths in cgroupv2 code. The problem is, the current code uses GetPaths for three kinds of things: 1. Get all the paths to cgroup v1 controllers to save its state (see (linuxContainer).currentState(), (LinuxFactory).loadState() methods). 2. Get all the paths to cgroup v1 controllers to have the setns process enter the proper cgroups in `(*setnsProcess).start()`. 3. Get the path to a specific controller (for example, `m.GetPaths()["devices"]`). Now, for cgroup v2 instead of a set of per-controller paths, we have only one single unified path, and a dedicated function `GetUnifiedPath()` to get it. This discrepancy between v1 and v2 cgroupManager API leads to the following problems with the code: - multiple if/else code blocks that have to treat v1 and v2 separately; - backward-compatible GetPaths() methods in v2 controllers; - - repeated writing of the PID into the same cgroup for v2; Overall, it's hard to write the right code with all this, and the code that is written is kinda hard to follow. The solution is to slightly change the API to do the 3 things outlined above in the same manner for v1 and v2: 1. Use `GetPaths()` for state saving and setns process cgroups entering. 2. Introduce and use Path(subsys string) to obtain a path to a subsystem. For v2, the argument is ignored and the unified path is returned. This commit converts all the controllers to the new API, and modifies all the users to use it. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-05-08 12:04:06 -07:00
Akihiro Suda	4540b596b8	Fix TestGetContainerStateAfterUpdate on cgroup v2 CI was failing on cgroup v2 because mockCgroupManager.GetUnifiedPath() was returning an error. Now the function returns the value of mockCgroupManager.unifiedPath, but the value is currently not used in the tests. Fix #2286 Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2020-04-03 09:12:38 +09:00
Julio Montes	8ddd892072	libcontainer: add method to get cgroup config from cgroup Manager `configs.Cgroup` contains the configuration used to create cgroups. This configuration must be saved to disk, since it's required to restore the cgroup manager that was used to create the cgroups. Add method to get cgroup configuration from cgroup Manager to allow API users save it to disk and restore a cgroup manager later. fixes #2176 Signed-off-by: Julio Montes <julio.montes@intel.com>	2019-12-17 22:46:03 +00:00
Akihiro Suda	dbd771e475	cgroup2: implement `runc ps` Implemented `runc ps` for cgroup v2 , using a newly added method `m.GetUnifiedPath()`. Unlike the v1 implementation that checks `m.GetPaths()["devices"]`, the v2 implementation does not require the device controller to be available. Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2019-10-19 01:59:24 +09:00
Kenta Tada	65032b55b1	libcontainer: fix TestGetContainerState to check configs.NEWCGROUP This test needs to handle the case of configs.NEWCGROUP as Namespace's type. Signed-off-by: Kenta Tada <Kenta.Tada@sony.com>	2019-05-21 09:10:38 +09:00
Danail Branekov	c486e3c406	Address comments in PR 1861 Refactor configuring logging into a reusable component so that it can be nicely used in both main() and init process init() Co-authored-by: Georgi Sabev <georgethebeatle@gmail.com> Co-authored-by: Giuseppe Capizzi <gcapizzi@pivotal.io> Co-authored-by: Claudia Beresford <cberesford@pivotal.io> Signed-off-by: Danail Branekov <danailster@gmail.com>	2019-04-04 14:57:28 +03:00
JoeWrightss	0855bce448	Fix .Fatalf() error message Signed-off-by: JoeWrightss <zhoulin.xie@daocloud.io>	2018-12-19 20:22:48 +08:00
Xiaochen Shen	f097339289	libcontainer: intelrdt: add test cases for Intel RDT/MBA Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>	2018-10-16 14:29:39 +08:00
Yan Zhu	feb90346e0	doc: fix typo Signed-off-by: Yan Zhu <yanzhu@alauda.io>	2018-09-07 11:58:59 +08:00
Danail Branekov	0495fece57	Ensure container tests do not write on the host TestGetContainerStateAfterUpdate creates its state.json file on the current directory which turns out to be the host runc directory. Thus whenever the test completes it leaves the state.json file behind thus a) poluting the local git repository b) changing the host file system violating the principle of doing everything in an isolated container environment This change would create a new temporary (in-container) directory and use it as linuxContainer.root Signed-off-by: Tom Godkin <tgodkin@pivotal.io>	2017-11-27 10:43:10 +02:00
Xiaochen Shen	88d22fde40	libcontainer: intelrdt: use init() to avoid race condition This is the follow-up PR of #1279 to fix remaining issues: Use init() to avoid race condition in IsIntelRdtEnabled(). Add also rename some variables and functions. Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>	2017-09-08 17:15:31 +08:00
Xiaochen Shen	4d2756c116	libcontainer: add test cases for Intel RDT/CAT Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>	2017-09-01 14:35:40 +08:00
Qiang Huang	e6e1c34a7d	Update state after update state.json should be a reflection of the container's realtime state, including resource configurations, so we should update state.json after updating container resources. Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2017-08-15 14:38:44 +08:00
Yuanhong Peng	e939079acf	Always save own namespace paths fix #1476 If containerA shares namespace, say ipc namespace, with containerB, then its ipc namespace path would be the same as containerB and be stored in `state.json`. Exec into containerA will just read the namespace paths stored in this file and join these namespaces. So, if containerB has already been stopped, `docker exec containerA` will fail. To address this issue, we should always save own namespace paths no matter if we share namespaces with other containers. Signed-off-by: Yuanhong Peng <pengyuanhong@huawei.com>	2017-07-13 16:13:05 +08:00
W. Trevor King	75d98b26b7	libcontainer: Replace GetProcessStartTime with Stat_t.StartTime And convert the various start-time properties from strings to uint64s. This removes all internal consumers of the deprecated GetProcessStartTime function. Signed-off-by: W. Trevor King <wking@tremily.us>	2017-06-20 16:26:55 -07:00
Akihiro Suda	1829531241	Fix trivial style errors reported by `go vet` and `golint` No substantial code change. Note that some style errors reported by `golint` are not fixed due to possible compatibility issues. Signed-off-by: Akihiro Suda <suda.kyoto@gmail.com>	2016-04-12 08:13:16 +00:00
Doug Davis	ff034a5119	Remove the nullState Add a "createdState" in its place since I think that better describes what its used for. Signed-off-by: Doug Davis <dug@us.ibm.com>	2016-01-25 00:26:11 -08:00
Jimmi Dyson	91c7024e52	Revert to non-recursive GetPids, add recursive GetAllPids Signed-off-by: Jimmi Dyson <jimmidyson@gmail.com>	2016-01-08 19:42:25 +00:00
Michael Crosby	4415446c32	Add state pattern for container state transition Signed-off-by: Michael Crosby <crosbymichael@gmail.com> Add state status() method Signed-off-by: Michael Crosby <crosbymichael@gmail.com> Allow multiple checkpoint on restore Signed-off-by: Michael Crosby <crosbymichael@gmail.com> Handle leave-running state Signed-off-by: Michael Crosby <crosbymichael@gmail.com> Fix state transitions for inprocess Because the tests use libcontainer in process between the various states we need to ensure that that usecase works as well as the out of process one. Signed-off-by: Michael Crosby <crosbymichael@gmail.com> Remove isDestroyed method Signed-off-by: Michael Crosby <crosbymichael@gmail.com> Handling Pausing from freezer state Signed-off-by: Rajasekaran <rajasec79@gmail.com> freezer status Signed-off-by: Rajasekaran <rajasec79@gmail.com> Fixing review comments Signed-off-by: Rajasekaran <rajasec79@gmail.com> Added comment when freezer not available Signed-off-by: Rajasekaran <rajasec79@gmail.com> Signed-off-by: Michael Crosby <crosbymichael@gmail.com> Conflicts: libcontainer/container_linux.go Change checkFreezer logic to isPaused() Signed-off-by: Michael Crosby <crosbymichael@gmail.com> Remove state base and factor out destroy func Signed-off-by: Michael Crosby <crosbymichael@gmail.com> Add unit test for state transitions Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2015-12-17 13:55:38 -08:00
Michael Crosby	080df7ab88	Update import paths for new repository Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2015-06-21 19:29:59 -07:00
Michael Crosby	8f97d39dd2	Move libcontainer into subdirectory Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2015-06-21 19:29:15 -07:00

39 Commits