zishuo/runc

mirror of https://github.com/opencontainers/runc.git synced 2025-12-24 11:50:58 +08:00

Author	SHA1	Message	Date
Kir Kolyshkin	7fdec327a0	Use any instead of interface{} The keyword is available since Go 1.18 (see https://pkg.go.dev/builtin#any). Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-03-31 17:15:06 -07:00
Kir Kolyshkin	a75076b4a4	Switch to opencontainers/cgroups This removes libcontainer/cgroups packages and starts using those from github.com/opencontainers/cgroups repo. Mostly generated by: git rm -f libcontainer/cgroups find . -type f -name "*.go" -exec sed -i \ 's\|github.com/opencontainers/runc/libcontainer/cgroups\|github.com/opencontainers/cgroups\|g' \ {} + go get github.com/opencontainers/cgroups@v0.0.1 make vendor gofumpt -w . Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-02-28 15:20:33 -08:00
Kir Kolyshkin	6c9ddcc648	libct: switch from libct/devices to libct/cgroups/devices/config Use the old package name as an alias to minimize the patch. No functional change; this just eliminates a bunch of deprecation warnings. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-01-31 16:51:09 -08:00
Aleksa Sarai	ebcef3e651	specconv: temporarily allow userns path and mapping if they match It turns out that the error added in commit `09822c3da8` ("configs: disallow ambiguous userns and timens configurations") causes issues with containerd and CRIO because they pass both userns mappings and a userns path. These configurations are broken, but to avoid the regression in this one case, output a warning to tell the user that the configuration is incorrect but we will continue to use it if and only if the configured mappings are identical to the mappings of the provided namespace. Fixes: `09822c3da8` ("configs: disallow ambiguous userns and timens configurations") Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2023-12-10 20:49:43 +11:00
Aleksa Sarai	09822c3da8	configs: disallow ambiguous userns and timens configurations For userns and timens, the mappings (and offsets, respectively) cannot be changed after the namespace is first configured. Thus, configuring a container with a namespace path to join means that you cannot also provide configuration for said namespace. Previously we would silently ignore the configuration (and just join the provided path), but we really should be returning an error (especially when you consider that the configuration userns mappings are used quite a bit in runc with the assumption that they are the correct mapping for the userns -- but in this case they are not). In the case of userns, the mappings are also required if you _do not_ specify a path, while in the case of the time namespace you can have a container with a timens but no mappings specified. It should be noted that the case checking that the user has not specified a userns path and a userns mapping needs to be handled in specconv (as opposed to the configuration validator) because with this patchset we now cache the mappings of path-based userns configurations and thus the validator can't be sure whether the mapping is a cached mapping or a user-specified one. So we do the validation in specconv, and thus the test for this needs to be an integration test. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2023-12-05 17:46:09 +11:00
Kir Kolyshkin	68427f33d0	libct/seccomp/config: add missing KillThread, KillProcess OCI spec added SCMP_ACT_KILL_THREAD and SCMP_ACT_KILL_PROCESS almost two years ago ([1], [2]), but runc support was half-finished [3]. Add these actions, and modify the test case to check them. In addition, "runc features" now lists the new actions. [1] https://github.com/opencontainers/runtime-spec/pull/1044 [2] https://github.com/opencontainers/runtime-spec/pull/1064 [3] https://github.com/opencontainers/runc/pulls/3204 Fixes: `4a4d4f109b` Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> (cherry picked from commit `e74fdeb88a`)	2022-05-04 16:22:09 +02:00
Akihiro Suda	2436322fef	Merge pull request #3365 from kolyshkin/checkPropertyName-speedup libct/specconv: checkPropertyName speedup	2022-02-17 12:59:23 +09:00
Kir Kolyshkin	0d21515038	libct: remove Validator interface We only have one implementation of config validator, which is always used. It makes no sense to have Validator interface. Having validate.Validator field in Factory does not make sense for all the same reasons. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2022-02-03 11:40:29 -08:00
Kir Kolyshkin	d37a9726f3	libct/specconv: test nits Commit `643f8a2b40` renamed isValidName to checkPropertyName, but fell short of renaming its test and benchmark. Fix that. Fixes: `643f8a2b40` Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2022-02-01 10:46:14 -08:00
Kir Kolyshkin	643f8a2b40	libct/specconv: nits 1. Decapitalize errors. 2. Rename isValidName to checkPropertyName. 3. Make it return a specific error. Suggested-by: Sebastiaan van Stijn <github@gone.nl> Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-11-17 17:32:28 -08:00
Kir Kolyshkin	029b73c1b0	libct/spec: replace isValidName regex with a function Also, add a simple test and a benchmark (just out of sheer curiosity). Benchmark results: name old time/op new time/op delta IsValidName-4 540ns ± 3% 45ns ± 1% -91.76% (p=0.008 n=5+5) Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-11-12 20:23:49 -08:00
Kir Kolyshkin	6907becaf9	libct/specconv: remove isSecSuffix regex Commit `1cd71dfd7` added isSecSuffix, but the same thing can be done easily without a regex. This is faster and saves some init time and memory. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-11-12 20:23:47 -08:00
Neil Johnson	2e0ceaa935	fix createDevices when no Linux section Signed-off-by: Neil Johnson <najohnsn@us.ibm.com>	2021-10-04 17:37:19 -04:00
Mauricio Vásquez	c64aaf0e0b	libcontainer/specconv: extend SetupSeccomp tests Extend the SetupSeccomp tests by adding the following cases: - Test nil config - Test empty config - Test bad action and architecture - Test all possible actions Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io>	2021-09-07 13:04:24 +02:00
Kir Kolyshkin	9ff64c3d97	*: rm redundant linux build tag For files that end with _linux.go or _linux_test.go, there is no need to specify linux build tag, as it is assumed from the file name. In addition, rename libcontainer/notify_linux_v2.go -> libcontainer/notify_v2_linux.go for the file name to make sense. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-08-30 20:15:00 -07:00
Kir Kolyshkin	e6048715e4	Use gofumpt to format code gofumpt (mvdan.cc/gofumpt) is a fork of gofmt with stricter rules. Brought to you by git ls-files \*.go \| grep -v ^vendor/ \| xargs gofumpt -s -w Looking at the diff, all these changes make sense. Also, replace gofmt with gofumpt in golangci.yml. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-06-01 12:17:27 -07:00
Aleksa Sarai	c7c70ce810	*: clean t.Skip messages Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2021-05-23 17:53:01 +10:00
Qiang Huang	2d38476c96	Merge pull request #2840 from kolyshkin/ignore-kmem Ignore kernel memory settings	2021-04-13 09:44:14 +08:00
Kir Kolyshkin	52390d6804	Ignore kernel memory settings This is somewhat radical approach to deal with kernel memory. Per-cgroup kernel memory limiting was always problematic. A few examples: - older kernels had bugs and were even oopsing sometimes (best example is RHEL7 kernel); - kernel is unable to reclaim the kernel memory so once the limit is hit a cgroup is toasted; - some kernel memory allocations don't allow failing. In addition to that, - users don't have a clue about how to set kernel memory limits (as the concept is much more complicated than e.g. [user] memory); - different kernels might have different kernel memory usage, which is sort of unexpected; - cgroup v2 do not have a [dedicated] kmem limit knob, and thus runc silently ignores kernel memory limits for v2; - kernel v5.4 made cgroup v1 kmem.limit obsoleted (see https://github.com/torvalds/linux/commit/0158115f702b). In view of all this, and as the runtime-spec lists memory.kernel and memory.kernelTCP as OPTIONAL, let's ignore kernel memory limits (for cgroup v1, same as we're already doing for v2). This should result in less bugs and better user experience. The only bad side effect from it might be that stat can show kernel memory usage as 0 (since the accounting is not enabled). [v2: add a warning in specconv that limits are ignored] Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-04-12 12:18:11 -07:00
Kir Kolyshkin	27bb1bd5ea	libct/specconv/CreateCgroupConfig: don't set c.Parent default c.Parent is only used by systemd cgroup drivers, and both v1 and v2 drivers do have code to set the default if it is empty, so setting it here is redundant. In addition, in case of cgroup v2 rootless container setting it here is harmful as the default should be user.slice not system.slice. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-04-01 19:50:37 -07:00
Sebastiaan van Stijn	4fc2de77e9	libcontainer/devices: remove "Device" prefix from types Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2020-12-01 11:11:23 +01:00
Sebastiaan van Stijn	677baf22d2	libcontainer: isolate libcontainer/devices Move the Device-related types to libcontainer/devices, so that the package can be used in isolation. Aliases have been created in libcontainer/configs for backward compatibility. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2020-12-01 11:11:21 +01:00
Cesar Talledo	0709202da7	Remove runc default devices that overlap with spec devices. Runc has a set of default devices that it includes in Linux containers (e.g., /dev/null, /dev/random, /dev/tty, etc.) However if the container's OCI spec includes all or a subset of those same devices, runc is currently not detecting the redundancy, causing it to create a lib container config that has redundant device configurations. This causes a failure in rootless mode, in particular when the /dev/tty device has a redundant config: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: rootfs_linux.go:70: creating device nodes caused: open /tmp/busyboxtest/rootfs/dev/tty: no such device or address" The reason this fails in rootless mode only is that in this case runc sets up /dev/tty not by doing mknod (it's not allowed within a user-ns) but rather by creating a regular file under /dev/tty and bind-mounting the host's /dev/tty to the container's /dev/tty. When this operation is done redundantly, it fails the second time. This change fixes this problem by ensuring runc checks for redundant devices between the OCI spec it receives and the default devices it configures. If a redundant device is detected, the OCI spec takes priority. The change adds both a unit test and an integration test to verify the behavior. Without this fix, this new integration test fails as shown above. Signed-off-by: Cesar Talledo <ctalledo@nestybox.com>	2020-08-07 16:46:15 -07:00
Renaud Gaubert	2f7bdf9d3b	Tests the new Hook Signed-off-by: Renaud Gaubert <rgaubert@nvidia.com>	2020-06-19 02:39:20 +00:00
Pradyumna Agrawal	4aa9101477	Honor spec.Process.NoNewPrivileges in specconv.CreateLibcontainerConfig The change ensures that the passed in value of NoNewPrivileges under spec.Process is reflected in the container config generated by specconv.CreateLibcontainerConfig Closes #2397 Signed-off-by: Pradyumna Agrawal <pradyumnaa@vmware.com>	2020-05-11 13:38:14 -07:00
Akihiro Suda	cc183ca662	Merge pull request #2242 from AkihiroSuda/vendor-systemd vendor: update go-systemd and godbus	2020-03-25 02:40:22 +09:00
Akihiro Suda	492d525e55	vendor: update go-systemd and godbus Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2020-03-16 13:26:03 +09:00
l00397676	62cfad97ca	specconv: add a test case to check null spec.Process Signed-off-by: l00397676 <lujingxiao@huawei.com>	2020-03-10 11:43:51 +08:00
Kir Kolyshkin	1cd71dfd71	systemd properties: support for *Sec values Some systemd properties are documented as having "Sec" suffix (e.g. "TimeoutStopSec") but are expected to have "USec" suffix when passed over dbus, so let's provide appropriate conversion to improve compatibility. This means, one can specify TimeoutStopSec with a numeric argument, in seconds, and it will be properly converted to TimeoutStopUsec with the argument in microseconds. As a side bonus, even float values are converted, so e.g. TimeoutStopSec=1.5 is possible. This turned out a bit more tricky to implement when I was originally expected, since there are a handful of numeric types in dbus and each one requires explicit conversion. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-02-17 16:07:19 -08:00
Kir Kolyshkin	4c5c3fb960	Support for setting systemd properties via annotations In case systemd is used to set cgroups for the container, it creates a scope unit dedicated to it (usually named `runc-$ID.scope`). This patch adds an ability to set arbitrary systemd properties for the systemd unit via runtime spec annotations. Initially this was developed as an ability to specify the `TimeoutStopUSec` property, but later generalized to work with arbitrary ones. Example usage: add the following to runtime spec (config.json): ``` "annotations": { "org.systemd.property.TimeoutStopUSec": "uint64 123456789", "org.systemd.property.CollectMode":"'inactive-or-failed'" }, ``` and start the container (e.g. `runc --systemd-cgroup run $ID`). The above will set the following systemd parameters: * `TimeoutStopSec` to 2 minutes and 3 seconds, * `CollectMode` to "inactive-or-failed". The values are in the gvariant format (see [1]). To figure out which type systemd expects for a particular parameter, see systemd sources. In particular, parameters with `USec` suffix require an `uint64` typed argument, while gvariant assumes int32 for a numeric values, therefore the explicit type is required. NOTE that systemd receives the time-typed parameters as USec but shows them (in `systemctl show`) as Sec. For example, the stop timeout should be set as `TimeoutStopUSec` but is shown as `TimeoutStopSec`. [1] https://developer.gnome.org/glib/stable/gvariant-text.html Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-02-17 16:07:19 -08:00
Julio Montes	cd7c59d042	libcontainer: export createCgroupConfig A `config.Cgroups` object is required to manipulate cgroups v1 and v2 using libcontainer. Export `createCgroupConfig` to allow API users to create `config.Cgroups` objects using directly libcontainer API. Signed-off-by: Julio Montes <julio.montes@intel.com>	2019-12-17 22:46:03 +00:00
Kenta Tada	b54fd85bbf	libcontainer: change seccomp test for clone syscall This commit changes the value of seccomp test for clone syscall. Also hardcoded values should be changed because it is unclear to understand what flags are tested. Related issues: * https://github.com/containerd/containerd/pull/3314 * https://github.com/moby/moby/pull/39308 * https://github.com/opencontainers/runtime-tools/pull/694 Signed-off-by: Kenta Tada <Kenta.Tada@sony.com>	2019-06-04 18:52:00 +09:00
Ace-Tang	95d1aa1886	test: fix TestDupNamespaces add Root in created spec, or error message is 'Root must be specified' Signed-off-by: Ace-Tang <aceapril@126.com>	2018-11-06 11:36:27 +08:00
Akihiro Suda	06f789cf26	Disable rootless mode except RootlessCgMgr when executed as the root in userns This PR decomposes `libcontainer/configs.Config.Rootless bool` into `RootlessEUID bool` and `RootlessCgroups bool`, so as to make "runc-in-userns" to be more compatible with "rootful" runc. `RootlessEUID` denotes that runc is being executed as a non-root user (euid != 0) in the current user namespace. `RootlessEUID` is almost identical to the former `Rootless` except cgroups stuff. `RootlessCgroups` denotes that runc is unlikely to have the full access to cgroups. `RootlessCgroups` is set to false if runc is executed as the root (euid == 0) in the initial namespace. Otherwise `RootlessCgroups` is set to true. (Hint: if `RootlessEUID` is true, `RootlessCgroups` becomes true as well) When runc is executed as the root (euid == 0) in an user namespace (e.g. by Docker-in-LXD, Podman, Usernetes), `RootlessEUID` is set to false but `RootlessCgroups` is set to true. So, "runc-in-userns" behaves almost same as "rootful" runc except that cgroups errors are ignored. This PR does not have any impact on CLI flags and `state.json`. Note about CLI: * Now `runc --rootless=(auto\|true\|false)` CLI flag is only used for setting `RootlessCgroups`. * Now `runc spec --rootless` is only required when `RootlessEUID` is set to true. For runc-in-userns, `runc spec` without `--rootless` should work, when sufficient numbers of UID/GID are mapped. Note about `$XDG_RUNTIME_DIR` (e.g. `/run/user/1000`): * `$XDG_RUNTIME_DIR` is ignored if runc is being executed as the root (euid == 0) in the initial namespace, for backward compatibility. (`/run/runc` is used) * If runc is executed as the root (euid == 0) in an user namespace, `$XDG_RUNTIME_DIR` is honored if `$USER != "" && $USER != "root"`. This allows unprivileged users to allow execute runc as the root in userns, without mounting writable `/run/runc`. Note about `state.json`: * `rootless` is set to true when `RootlessEUID == true && RootlessCgroups == true`. Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>	2018-09-07 15:05:03 +09:00
dlorenc	40680b2d37	Make the setupSeccomp function public. This function is useful for converting from the OCI spec format to the one used by runC/libcontainer. Signed-off-by: dlorenc <lorenc.d@gmail.com>	2018-04-17 10:47:22 -07:00
Lorenzo Fontana	780f8ef567	Specconv: Test create command hooks and seccomp setup Signed-off-by: Lorenzo Fontana <lo@linux.com>	2017-10-28 21:46:46 +02:00
Lorenzo Fontana	c0e6e12f9d	Test Cgroup creation and memory allocations Signed-off-by: Lorenzo Fontana <lo@linux.com>	2017-10-25 01:58:10 +02:00
Aleksa Sarai	d04cbc49d2	rootless: add autogenerated rootless config from `runc spec` Since this is a runC-specific feature, this belongs here over in opencontainers/ocitools (which is for generic OCI runtimes). In addition, we don't create a new network namespace. This is because currently if you want to set up a veth bridge you need CAP_NET_ADMIN in both network namespaces' pinned user namespace to create the necessary interfaces in each network namespace. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-03-23 20:46:21 +11:00
Aleksa Sarai	d2f49696b0	runc: add support for rootless containers This enables the support for the rootless container mode. There are many restrictions on what rootless containers can do, so many different runC commands have been disabled: * runc checkpoint * runc events * runc pause * runc ps * runc restore * runc resume * runc update The following commands work: * runc create * runc delete * runc exec * runc kill * runc list * runc run * runc spec * runc state In addition, any specification options that imply joining cgroups have also been disabled. This is due to support for unprivileged subtree management not being available from Linux upstream. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-03-23 20:45:24 +11:00
Mrunal Patel	4f9cb13b64	Update runtime spec to 1.0.0.rc5 Signed-off-by: Mrunal Patel <mrunalp@gmail.com>	2017-03-15 11:38:37 -07:00
Zhang Wei	8eea644ccc	Bump runtime-spec to v1.0.0-rc3 * Bump underlying runtime-spec to version 1.0.0-rc3 * Fix related changed struct names in config.go Signed-off-by: Zhang Wei <zhangwei555@huawei.com>	2016-12-17 14:02:35 +08:00
Zhang Wei	a0f7977f0f	Detect and forbid duplicated namespace in spec When spec file contains duplicated namespaces, e.g. specs: specs.Spec{ Linux: &specs.Linux{ Namespaces: []specs.Namespace{ { Type: "pid", }, { Type: "pid", Path: "/proc/1/ns/pid", }, }, }, } runc should report malformed spec instead of using latest one by default, because this spec could be quite confusing. Signed-off-by: Zhang Wei <zhangwei555@huawei.com>	2016-10-27 00:44:36 +08:00
Adam Thomason	83cbdbd64c	Add checks for nil spec.Linux Signed-off-by: Adam Thomason <ad@mthomason.net>	2016-09-11 16:31:34 -07:00
Qiang Huang	220e5098a8	Fix default cgroup path Alternative of #895 , part of #892 The intension of current behavior if to create cgroup in parent cgroup of current process, but we did this in a wrong way, we used devices cgroup path of current process as the default parent path for all subsystems, this is wrong because we don't always have the same cgroup path for all subsystems. Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2016-08-30 14:12:15 +08:00
Yen-Lin Chen	a318a2ae1b	Fixed typo in build constraint. Signed-off-by: Yenlin Chen <hencrice@gmail.com>	2016-07-15 19:24:22 -07:00
Michael Crosby	f417e993d0	Update spec to v0.5.0 Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-04-12 14:11:40 -07:00
Ido Yariv	28b21a5988	Export CreateLibcontainerConfig Users of libcontainer other than runc may also require parsing and converting specification configuration files. Since runc cannot be imported, move the relevant functions and definitions to a separate package, libcontainer/specconv. Signed-off-by: Ido Yariv <ido@wizery.com>	2016-03-25 12:19:18 -04:00

47 Commits