zishuo/runc

mirror of https://github.com/opencontainers/runc.git synced 2025-10-07 08:21:01 +08:00

Author	SHA1	Message	Date
Kir Kolyshkin	b3be2b0b4f	libct: close execFifo after start Apparently, the parent never closes execFifo fd. Not a problem for runc per se, but can be an issue for a user of libcontainer. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-03-30 19:58:09 -07:00
Kir Kolyshkin	201d60c51d	runc run/start/exec: fix init log forwarding race Sometimes debug.bats test cases are failing like this: > not ok 27 global --debug to --log --log-format 'json' > # (in test file tests/integration/debug.bats, line 77) > # `[[ "${output}" == "child process in init()" ]]' failed It happens more when writing to disk. This issue is caused by the fact that runc spawns log forwarding goroutine (ForwardLogs) but does not wait for it to finish, resulting in missing debug lines from nsexec. ForwardLogs itself, though, never finishes, because it reads from a reading side of a pipe which writing side is not closed. This is especially true in case of runc create, which spawns runc init and exits; meanwhile runc init waits on exec fifo for arbitrarily long time before doing execve. So, to fix the failure described above, we need to: 1. Make runc create/run/exec wait for ForwardLogs to finish; 2. Make runc init close its log pipe file descriptor (i.e. the one which value is passed in _LIBCONTAINER_LOGPIPE environment variable). This is exactly what this commit does: 1. Amend ForwardLogs to return a channel, and wait for it in start(). 2. In runc init, save the log fd and close it as late as possible. PS I have to admit I still do not understand why an explicit close of log pipe fd is required in e.g. (*linuxSetnsInit).Init, right before the execve which (thanks to CLOEXEC) closes the fd anyway. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-03-25 19:18:55 -07:00
Akihiro Suda	0ae1475066	Merge pull request #2798 from adrianreber/2021-02-08-nested-bind.mounts Correctly restore containers with nested bind mounts	2021-03-17 12:47:21 +09:00
Kir Kolyshkin	97f2e351a8	go.mod, libct: bump go-criu to v5, use google.golang.org/protobuf Switch from github.com/golang/protobuf (which appears to be obsoleted) to google.golang.org/protobuf (which appears to be a replacement). This needs a bump to go-criu v5. [v2: fix debug print in criuSwrk] [v3: switch to go-criu v5] Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-03-04 08:59:55 -08:00
Kir Kolyshkin	db025aba75	libct: criuSwrk: only iterate over CriuOpts if debug is set In case log level is less than debug, this code does nothing, so let's add a condition and skip it entirely. Add a test case to make sure this code path is hit. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-03-04 08:58:42 -08:00
Kir Kolyshkin	38b2dd391d	runc exec: report possible OOM kill An exec may fail due to memory shortage (cgroup memory limits being too tight), and an error message provided in this case is clueless: > $ sudo ../runc/runc exec xx56 top > ERRO[0000] exec failed: container_linux.go:367: starting container process caused: read init-p: connection reset by peer Same as the previous commit for run/start, check the OOM kill counter and report an OOM kill. The differences from run are 1. The container is already running and OOM kill counter might not be zero. This is why we have to read the counter before exec and after it failed. 2. An unrelated OOM kill event might occur in parallel with our exec (and I see no way to find out which process was killed, except to parse kernel logs which seems excessive and not very reliable). This is why we report _possible_ OOM kill. With this commit, the error message looks like: > ERRO[0000] exec failed: container_linux.go:367: starting container process caused: process_linux.go:105: possibly OOM-killed caused: read init-p: connection reset by peer Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-02-23 16:16:33 -08:00
Adrian Reber	705b6cc723	Re-create mountpoints during restore runc was already re-creating mountpoints before calling CRIU to restore a container. But mountpoints inside a bind mount were not re-created. During initial container creation runc will mount bind mounts and then create the necessary mountpoints for further mounts inside those bind mounts. If, for example, one of the bind mounts is a tmpfs and empty before restore, CRIU will fail re-mounting all mounts because the mountpoints in the bind mounted tmpfs no longer exist. CRIU expects all mount points to exist as during checkpointing. This changes runc to mount bind mounts after mountpoint creation to ensure nested bind mounts have their mountpoints created before CRIU does the restore. Signed-off-by: Adrian Reber <areber@redhat.com>	2021-02-10 08:55:57 +01:00
Kir Kolyshkin	72f463891d	libct: add TODO about os.ErrProcessDone This is a new variable added by go 1.16 so we'll have to wait until 1.16 is minimally supported version, thus TODO for now. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-01-06 14:57:39 -08:00
Kir Kolyshkin	d7df30181b	libct: suppress bogus "unable to terminate" warnings While working on a test case for [1], I got the following warning: > level=warning msg="unable to terminate initProcess" error="exit status 1" Obviously, the warning is bogus since the initProcess is terminated. This is happening because terminate() can return errors from either Kill() or Wait(), and the latter returns an error if the process has not finished successfully (i.e. exit status is not 0 or it was killed). Check for a particular error type and filter out those errors. [1] https://github.com/opencontainers/runc/issues/2683 Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-01-06 14:57:39 -08:00
Kir Kolyshkin	6437086ef5	libct/addCriu*Mount: fix gosimple warning > libcontainer/container_linux.go:768:2: S1017: should replace this `if` statement with an unconditional `strings.TrimPrefix` (gosimple) > if strings.HasPrefix(mountDest, c.config.Rootfs) { > ^ > libcontainer/container_linux.go:1150:2: S1017: should replace this `if` statement with an unconditional `strings.TrimPrefix` (gosimple) > if strings.HasPrefix(mountDest, c.config.Rootfs) { > ^ Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-12-03 10:24:27 -08:00
Kir Kolyshkin	d0b5954826	libct/checkCriuFeatures: fix gosimple linter warning > libcontainer/container_linux.go:683:2: S1021: should merge variable declaration with assignment on next line (gosimple) > var t criurpc.CriuReqType > ^ Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-12-03 10:24:27 -08:00
Kir Kolyshkin	11680cd2c7	libct: fix "unused" linter warning Commit `4415446c32` introduces this function which is never used. Remove it. This fixes > libcontainer/container_linux.go:1813:26: func `(*linuxContainer).deleteState` is unused (unused) Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-12-03 10:24:27 -08:00
Akihiro Suda	fb59e6f726	Merge pull request #2583 from adrianreber/join-more-namespaces restore: tell CRIU to use existing namespaces	2020-11-10 00:19:40 +09:00
Amim Knabben	978fa6e906	Fixing some lint issues Signed-off-by: Amim Knabben <amim.knabben@gmail.com>	2020-10-06 14:44:14 -04:00
Akihiro Suda	890cc2aa60	Merge pull request #2612 from thaJeztah/concat use string-concatenation instead of sprintf for simple cases	2020-10-01 01:56:28 +09:00
Sebastiaan van Stijn	8bf216728c	use string-concatenation instead of sprintf for simple cases Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2020-09-30 10:51:59 +02:00
Kir Kolyshkin	a4d5e8a27b	libcontainer/ignoreTerminateError: ignore SIGKILL When we call terminate(), we kill the process, and wait returns the error indicating the process was killed. This is exactly what we expect here, so there is no reason to treat it as an error. Before this patch, when a container with invalid cgroup parameters is started: > WARN[0000] unable to terminate initProcess error="signal: killed" > ERRO[0000] container_linux.go:366: starting container process caused: process_linux.go:495: container init caused: process_linux.go:458: setting cgroup config for procHooks process caused: failed to write "555": open /sys/fs/cgroup/blkio/user.slice/xx33/blkio.weight: permission denied After: > ERRO[0000] container_linux.go:366: starting container process caused: process_linux.go:495: container init caused: process_linux.go:458: setting cgroup config for procHooks process caused: failed to write "555": open /sys/fs/cgroup/blkio/user.slice/xx33/blkio.weight: permission denied I.e. the useless warning is gone. NOTE this breaks a couple of integration test cases, since they were expecting a particular message in the second line, and now due to "signal: killed" removed it's in the first line. Fix those, too. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-09-29 23:55:02 -07:00
Kir Kolyshkin	dc42459130	libct/(initProcess).start: fix removing cgroups on error In case cgroup configuration is invalid (some parameters can't be set etc.), p.manager.Set fails, the error is returned, and then we try to remove cgroups (by calling p.manager.Destroy) in a defer. The problem is, the container init is not yet killed (as it is killed in the caller, i.e. (linuxContainer).start), so cgroup removal fails like this: > time="2020-09-26T07:46:25Z" level=warning msg="Failed to remove cgroup (will retry)" error="rmdir /sys/fs/cgroup/net_cls,net_prio/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod28ce6e74_694c_4b77_a953_dc01e182ac76.slice/crio-f6984c5eeb6c6b49ff3f036bdcb9ded317b3d0b2469ebbb35705442a2afd98c2.scope: device or resource busy" > ... > time="2020-09-26T07:46:27Z" level=error msg="Failed to remove cgroup" error="rmdir /sys/fs/cgroup/net_cls,net_prio/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod28ce6e74_694c_4b77_a953_dc01e182ac76.slice/crio-f6984c5eeb6c6b49ff3f036bdcb9ded317b3d0b2469ebbb35705442a2afd98c2.scope: device or resource busy" The above is repeated for every controller, and looks quite scary. To fix, move the init termination to the abovementioned defer. Do the same for (*setnsProcess).start() for uniformity. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-09-29 18:49:29 -07:00
Adrian Reber	2ccefa63e2	restore: tell CRIU to use existing namespaces runc already tells CRIU to restore into an existing network or PID namespace if there is a path to a namespace specified in config.json. PID and network have special handling in CRIU using CRIU's inherit_fd interface. For UTS, IPC and MOUNT namespaces CRIU can join those existing namespaces using CRIU's join_ns interface. This is especially interesting for environments where containers are running in a pod which already has running containers (pause for example) with namespaces configured and the restored container needs to join these namespaces. CRIU has no support to join an existing CGROUP namespace (yet?) why restoring a container with a path specified to a CGROUP namespace will be aborted by runc. CRIU would have support to restore a container into an existing time namespace, but runc does not yet seem to support time namespaces. Signed-off-by: Adrian Reber <areber@redhat.com>	2020-09-17 15:32:44 +02:00
Wei Fu	ba0246da75	libcontainer: Store state.json before sync procRun If the procRun state has been synced and the runc-create process has been killed for some reason, the runc-init[2:stage] process will be leaky. And the runc command also fails to parse root directory because the container doesn't have state.json. In order to make it possible to clean the leaky runc-init[2:stage] process , we should store the status before sync procRun. ```before current workflow: [ child ] <-> [ parent ] procHooks --> [run hooks] <-- procResume procReady --> [final setup] <-- procRun ( killed for some reason) ( store state.json ) ``` ```expected expected workflow: [ child ] <-> [ parent ] procHooks --> [run hooks] <-- procResume procReady --> [final setup] store state.json <-- procRun ``` Signed-off-by: Wei Fu <fuweid89@gmail.com>	2020-09-07 23:22:42 +08:00
Mrunal Patel	9ada2e6d4f	Merge pull request #2539 from kolyshkin/ext-pidns-nits external pidns c/r code nits	2020-08-17 11:41:46 -07:00
Akihiro Suda	234d15ecd0	Merge pull request #2520 from thaJeztah/bump_runtime_spec vendor: update runtime-spec v1.0.3-0.20200728170252-4d89ac9fbff6	2020-08-04 14:05:33 +09:00
Akihiro Suda	78d02e8563	Merge pull request #2534 from adrianreber/go-criu-4-1-0 Pass location of CRIU binary to go-criu	2020-08-03 16:21:50 +09:00
Kir Kolyshkin	e54d1e4715	libct: initialize inheritFD in place Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-07-31 17:55:34 -07:00
Kir Kolyshkin	8b973997a4	libct: criuNsToKey doesn't have to be a method Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-07-31 17:52:09 -07:00
Adrian Reber	6f4616dd73	Pass location of CRIU binary to go-criu If the CRIU binary is in a non $PATH location and passed to runc via '--criu /path/to/criu', this information has not been passed to go-criu and since the switch to use go-criu for CRIU version detection, non $PATH CRIU usage was broken. This uses the newly added go-criu interface to pass the location of the binary to go-criu. Signed-off-by: Adrian Reber <areber@redhat.com>	2020-07-31 11:14:15 +02:00
Sebastiaan van Stijn	901dccf05d	vendor: update runtime-spec v1.0.3-0.20200728170252-4d89ac9fbff6 Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2020-07-30 22:08:54 +02:00
Adrian Reber	09e103b01e	Tell CRIU to use an external pid namespace if necessary Trying to checkpoint a container out of pod in cri-o fails with: Error (criu/namespaces.c:1081): Can't dump a pid namespace without the process init Starting with the upcoming CRIU release 3.15, CRIU can be told to ignore the PID namespace during checkpointing and to restore processes into an existing network namespace. With the changes from this commit and CRIU 3.15 it is possible to checkpoint a container out of a pod in cri-o. Signed-off-by: Adrian Reber <areber@redhat.com>	2020-07-27 10:14:08 +02:00
Adrian Reber	610c5ad75c	Factor out checkpointing with external namespace code To checkpoint and restore a container with an external network namespace (like with Podman and CNI), runc tells CRIU to ignore the network namespace during checkpoint and restore. This commit moves that code to their own functions to be able to reuse the same code path for external PID namespaces which are necessary for checkpointing and restoring containers out of a pod in cri-o. Signed-off-by: Adrian Reber <areber@redhat.com>	2020-07-27 10:14:07 +02:00
Kir Kolyshkin	108ee85b82	libct/cgroups: add SkipDevices to Resources The kubelet uses libct/cgroups code to set up cgroups. It creates a parent cgroup (kubepods) to put the containers into. The problem (for cgroupv2 that uses eBPF for device configuration) is the hard requirement to have devices cgroup configured results in leaking an eBPF program upon every kubelet restart. program. If kubelet is restarted 64+ times, the cgroup can't be configured anymore. Work around this by adding a SkipDevices flag to Resources. A check was added so that if SkipDevices is set, such a "container" can't be started (to make sure it is only used for non-containers). Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-07-02 15:19:31 -07:00
Kir Kolyshkin	dff7685c18	Merge pull request #2459 from tedyu/linux-cont-set-cfg Set configs back when intelrdt configs cannot be set LGTMS: @AkihiroSuda @kolyshkin	2020-06-19 12:57:53 -07:00
Akihiro Suda	9748b48742	Merge pull request #2229 from RenaudWasTaken/create-container Add CreateRuntime, CreateContainer and StartContainer Hooks	2020-06-19 12:27:51 +09:00
Renaud Gaubert	ccdd75760c	Add the CreateRuntime, CreateContainer and StartContainer Hooks Signed-off-by: Renaud Gaubert <rgaubert@nvidia.com>	2020-06-17 02:10:00 +00:00
Kir Kolyshkin	d5c57dcea6	libct/criuApplyCgroups: don't set cgroup paths for v2 There is no need to have cgroupv1-specific controller paths on restore in case of cgroupv2. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-06-16 12:40:02 -07:00
Kir Kolyshkin	52b56bc28e	libc/criuSwrk: remove applyCgroups param Its value can be easily deduced from the request type. While at it, simplify the call logic. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-06-16 12:40:01 -07:00
Kir Kolyshkin	5b247e739c	Merge pull request #2338 from lifubang/systemdcgroupv2 fix path error in systemd when stopped LGTMs: @mrunalp @AkihiroSuda	2020-06-15 18:01:13 -07:00
Akihiro Suda	601fa557c0	Merge pull request #2414 from kolyshkin/criu-notif use lazy-pages ready notification for criu >= 3.15	2020-06-16 09:31:12 +09:00
Mrunal Patel	a4a306d2a2	Write state.json atomically We want to make sure that the state file is syned and cannot be read partially or truncated. Signed-off-by: Mrunal Patel <mrunalp@gmail.com>	2020-06-10 20:21:04 -07:00
Ted Yu	9d275d326c	Set configs back when intelrdt configs cannot be set Signed-off-by: Ted Yu <yuzhihong@gmail.com>	2020-06-06 15:10:45 -07:00
lifubang	9087f2e827	fix path error in systemd when stopped When we use cgroup with systemd driver, the cgroup path will be auto removed by systemd when all processes exited. So we should check cgroup path exists when we access the cgroup path, for example in `kill/ps`, or else we will got an error. Signed-off-by: lifubang <lifubang@acmcoder.com>	2020-06-02 18:17:43 +08:00
Akihiro Suda	c91fe9aeba	cgroup2: exec: join the cgroup of the init process on EBUSY Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2020-05-31 13:09:43 +09:00
Ted Yu	3ba3d9b1bd	Wait for criuProcess once Signed-off-by: Ted Yu <yuzhihong@gmail.com>	2020-05-29 15:50:37 -07:00
Kir Kolyshkin	68391c0e96	use lazy-pages ready notification for criu >= 3.15 This relies on https://github.com/checkpoint-restore/criu/pull/1069 and emulates the previous behavior by writing \0 and closing status fd (as it was done by criu). Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-05-23 11:37:28 -07:00
Kir Kolyshkin	7ab1329835	libct/criuNotifications: simplify switch Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-05-23 11:37:28 -07:00
Adrian Reber	944e057025	Update to latest go-criu (4.0.2) This updates to the latest version of go-criu (4.0.2) which is based on CRIU 3.14. As go-criu provides an existing way to query the CRIU binary for its version this also removes all the code from runc to handle CRIU version checking and now relies on go-criu. An important side effect of this change is that this raises the minimum CRIU version to 3.0.0 as that is the first CRIU version that supports CRIU version queries via RPC in contrast to parsing the output of 'criu --version' CRIU 3.0 has been released in April of 2017. Signed-off-by: Adrian Reber <areber@redhat.com>	2020-05-20 13:49:38 +02:00
Akihiro Suda	f369199ff6	Merge pull request #2413 from JFHwang/2392-spec-check Add nil check of spec.Process in validateProcessSpec()	2020-05-19 08:11:22 +09:00
Mrunal Patel	825e91ada6	Merge pull request #2341 from kolyshkin/test-cpt-lazy runc checkpoint: fix --status-fd to accept fd	2020-05-18 10:43:24 -07:00
John Hwang	7fc291fd45	Replace formatted errors when unneeded Signed-off-by: John Hwang <John.F.Hwang@gmail.com>	2020-05-16 18:13:21 -07:00
Aleksa Sarai	859a780d6f	cgroups: add GetFreezerState() helper to Manager This is effectively a nicer implementation of the container.isPaused() helper, but to be used within the cgroup code for handling some fun issues we have to fix with the systemd cgroup driver. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2020-05-13 17:38:45 +10:00
Kir Kolyshkin	ca1d135bd4	runc checkpoint: fix --status-fd to accept fd 1. The command `runc checkpoint --lazy-server --status-fd $FD` actually accepts a file name as an $FD. Make it accept a file descriptor, like its name implies and the documentation states. In addition, since runc itself does not use the result of CRIU status fd, remove the code which relays it, and pass the FD directly to CRIU. Note 1: runc should close this file descriptor itself after passing it to criu, otherwise whoever waits on it might wait forever. Note 2: due to the way criu swrk consumes the fd (it reopens /proc/$SENDER_PID/fd/$FD), runc can't close it as soon as criu swrk has started. There is no good way to know when criu swrk has reopened the fd, so we assume that as soon as we have received something back, the fd is already reopened. 2. Since the meaning of --status-fd has changed, the test case using it needs to be fixed as well. Modify the lazy migration test to remove "sleep 2", actually waiting for the the lazy page server to be ready. While at it, - remove the double fork (using shell's background process is sufficient here); - check the exit code for "runc checkpoint" and "criu lazy-pages"; - remove the check for no errors in dump.log after restore, as we are already checking its exit code. [v2: properly close status fd after spawning criu] [v3: move close status fd to after the first read] Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-05-11 15:36:50 -07:00

1 2 3 4 5 ...

315 Commits