zishuo/runc

mirror of https://github.com/opencontainers/runc.git synced 2025-12-24 11:50:58 +08:00

Author	SHA1	Message	Date
Aleksa Sarai	8ab2458bc4	update: switch to generics for mkPtr logic This is much easier to read and removes the need for explicit per-type helper functions. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-11 15:16:50 +11:00
Aleksa Sarai	3b75374cc7	runtime-spec: update pids.limit handling to match new guidance The main update is actually in github.com/opencontainers/cgroups, but we need to also update runtime-spec to a newer pre-release version to get the updates from there as well. In short, the behaviour change is now that "0" is treated as a valid value to set in "pids.max", "-1" means "max" and unset/nil means "do nothing". As described in the opencontainers/cgroups PR, this change is actually backwards compatible because our internal state.json stores PidsLimit, and that entry is marked as "omitempty". So, an old runc would omit PidsLimit=0 in state.json, and this will be parsed by a new runc as being "nil" -- and both would treat this case as "do not set anything". Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-11 15:15:27 +11:00
Markus Lehtonen	85801e845e	runc update: refuse to create new rdt group Error out --l3-cache-schema and --mem-bw-schema if the original spec didn't specify intelRdt which also means that no CLOS (resctrl group) was created for the container. This prevents serious issues in this corner case. First, a CLOS was created but the schemata of the CLOS was not correctly updated. Confusingly, calling runc update twice did the job: the first call created the resctrl group and the seccond call was able to update the schemata. This issue would be relatively easily fixable, though. Second, more severe issue is that creating new CLOSes this way caused them to be orphaned, not being removed when the container exists. This is caused by runc not capturing the updated state (original spec was intelRdt=nil -> no CLOS but after update this is not the case). The most severe problem is that the update only move (or tried to move) the original init process pid but all children escaped the update. Doing this (i.e. migrating all processes of a container from CLOS to another CLOS) reliably, race-free, would probably require freezing the container. Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>	2025-08-01 14:36:51 +03:00
Markus Lehtonen	57b6a317bb	runc update: don't lose intelRdt state Prevent --l3-cache-schema from clearing the intel_rdt.memBwSchema state and --mem-bw-schema clearing l3_cache_schema, respectively. Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>	2025-07-31 17:31:52 +03:00
Kir Kolyshkin	0b01dccfbb	runc update: handle duplicated devs properly In case there's a duplicate in the device list, the latter entry overrides the former one. So, we need to modify the last entry, not the first one. To do that, use slices.Backward. Amend the test case to test the fix. Reported-by: lifubang <lifubang@acmcoder.com> Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-06-17 15:16:55 -07:00
Kir Kolyshkin	7696402dac	runc update: support per-device weight and iops This support was missing from runc, and thus the example from the podman-update wasn't working. To fix, introduce a function to either update or insert new weights and iops. Add integration tests. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-06-17 15:16:55 -07:00
Kir Kolyshkin	a75076b4a4	Switch to opencontainers/cgroups This removes libcontainer/cgroups packages and starts using those from github.com/opencontainers/cgroups repo. Mostly generated by: git rm -f libcontainer/cgroups find . -type f -name "*.go" -exec sed -i \ 's\|github.com/opencontainers/runc/libcontainer/cgroups\|github.com/opencontainers/cgroups\|g' \ {} + go get github.com/opencontainers/cgroups@v0.0.1 make vendor gofumpt -w . Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-02-28 15:20:33 -08:00
Akihiro Suda	e4bf49ff2c	runc update: distinguish nil from zero Prior to this commit, commands like `runc update --cpuset-cpus=1` were implying to set cpu burst to "0" (which does not mean "leave it as is"). This was failing when the kernel does not support cpu burst: `openat2 /sys/fs/cgroup/runc-cgroups-integration-test/test-cgroup-22167/cpu.max.burst: no such file or directory` Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2024-04-04 00:56:19 +08:00
dependabot[bot]	606251ab33	build(deps): bump github.com/opencontainers/runtime-spec Bumps [github.com/opencontainers/runtime-spec](https://github.com/opencontainers/runtime-spec) from 1.1.1-0.20230823135140-4fec88fd00a4 to 1.2.0. - [Release notes](https://github.com/opencontainers/runtime-spec/releases) - [Changelog](https://github.com/opencontainers/runtime-spec/blob/main/ChangeLog) - [Commits](https://github.com/opencontainers/runtime-spec/commits/v1.2.0) --- updated-dependencies: - dependency-name: github.com/opencontainers/runtime-spec dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2024-03-07 14:43:33 +09:00
Kailun Qin	e1584831b6	libct/cg: add CFS bandwidth burst for CPU Burstable CFS controller is introduced in Linux 5.14. This helps with parallel workloads that might be bursty. They can get throttled even when their average utilization is under quota. And they may be latency sensitive at the same time so that throttling them is undesired. This feature borrows time now against the future underrun, at the cost of increased interference against the other system users, by introducing cfs_burst_us into CFS bandwidth control to enact the cap on unused bandwidth accumulation, which will then used additionally for burst. The patch adds the support/control for CFS bandwidth burst. runtime-spec: https://github.com/opencontainers/runtime-spec/pull/1120 Co-authored-by: Akihiro Suda <suda.kyoto@gmail.com> Co-authored-by: Nadeshiko Manju <me@manjusaka.me> Signed-off-by: Kailun Qin <kailun.qin@intel.com>	2023-09-06 23:23:30 +08:00
hang.jiang	937ca107c3	Fix File to Close Signed-off-by: hang.jiang <hang.jiang@daocloud.io>	2023-09-01 16:17:13 +08:00
wineway	81c379fa8b	support SCHED_IDLE for runc cgroupfs Signed-off-by: wineway <wangyuweihx@gmail.com>	2023-01-31 15:19:05 +08:00
Kir Kolyshkin	6462e9de67	runc update: implement memory.checkBeforeUpdate This is aimed at solving the problem of cgroup v2 memory controller behavior which is not compatible with that of cgroup v1. In cgroup v1, if the new memory limit being set is lower than the current usage, setting the new limit fails. In cgroup v2, same operation succeeds, and the container is OOM killed. Introduce a new setting, memory.checkBeforeUpdate, and use it to mimic cgroup v1 behavior. Note that this is not 100% reliable because of TOCTOU, but this is the best we can do. Add some test cases. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2022-11-02 17:15:26 -07:00
Kir Kolyshkin	89733cd055	Format sources using gofumpt 0.2.1 ... which adds a wee more whitespace fixes. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2022-03-07 10:42:01 -08:00
Kir Kolyshkin	c5b0be78e8	Rm build tags from main pkg This was added by commit `5aa82c950` back in the day when we thought runc is going to be cross-platform. It's very clear now it's Linux-only package. While at it, further clarify it in README that we're Linux only. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-08-30 20:15:01 -07:00
Kir Kolyshkin	75761bccf7	Fix codespell warnings, add codespell to ci The two exceptions I had to add to codespellrc are: - CLOS (used by intelrtd); - creat (syscall name used in tests/integration/testdata/seccomp_*.json). Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-08-17 16:12:35 -07:00
Kir Kolyshkin	7be93a66b9	*: fmt.Errorf: use %w when appropriate This should result in no change when the error is printed, but make the errors returned unwrappable, meaning errors.As and errors.Is will work. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-06-22 16:09:47 -07:00
Kir Kolyshkin	cf4ecaed08	runc update: hide --kernel* options Commit `52390d6804` made this parameters obsoleted, but they are still shown in e.g. runc update --help output. Hide them (and maybe in 5 years we can remove them). Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-06-15 17:48:40 -07:00
Kir Kolyshkin	bf7492ee5d	runc update: skip devices The runc update CLI is not able to modify devices, so let's set SkipDevices (so that a cgroup controller won't try to update devices cgroup). This helps use cases when some other device management (NVIDIA GPUs) applies its configuration on top of what runc does. Make sure we do not save SkipDevices into state.json. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-06-03 10:40:55 -07:00
Kir Kolyshkin	e6048715e4	Use gofumpt to format code gofumpt (mvdan.cc/gofumpt) is a fork of gofmt with stricter rules. Brought to you by git ls-files \*.go \| grep -v ^vendor/ \| xargs gofumpt -s -w Looking at the diff, all these changes make sense. Also, replace gofmt with gofumpt in golangci.yml. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-06-01 12:17:27 -07:00
Kir Kolyshkin	52390d6804	Ignore kernel memory settings This is somewhat radical approach to deal with kernel memory. Per-cgroup kernel memory limiting was always problematic. A few examples: - older kernels had bugs and were even oopsing sometimes (best example is RHEL7 kernel); - kernel is unable to reclaim the kernel memory so once the limit is hit a cgroup is toasted; - some kernel memory allocations don't allow failing. In addition to that, - users don't have a clue about how to set kernel memory limits (as the concept is much more complicated than e.g. [user] memory); - different kernels might have different kernel memory usage, which is sort of unexpected; - cgroup v2 do not have a [dedicated] kmem limit knob, and thus runc silently ignores kernel memory limits for v2; - kernel v5.4 made cgroup v1 kmem.limit obsoleted (see https://github.com/torvalds/linux/commit/0158115f702b). In view of all this, and as the runtime-spec lists memory.kernel and memory.kernelTCP as OPTIONAL, let's ignore kernel memory limits (for cgroup v1, same as we're already doing for v2). This should result in less bugs and better user experience. The only bad side effect from it might be that stat can show kernel memory usage as 0 (since the accounting is not enabled). [v2: add a warning in specconv that limits are ignored] Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-04-12 12:18:11 -07:00
Xiaochen Shen	f62ad4a0de	libcontainer/intelrdt: rename CAT and MBA enabled flags Rename CAT and MBA enabled flags to be consistent with others. No functional change. Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>	2020-11-10 15:32:01 +08:00
Kir Kolyshkin	390a98f3fd	runc update: support unified resources Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-11-05 16:04:41 -08:00
Xiaochen Shen	2c004a101e	libcontainer/intelrdt: introduce NewManager() Introduce NewManager() to wrap up IntelRdtManager initialization. And call it when required. Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>	2020-10-17 16:59:20 +08:00
Kir Kolyshkin	32746fb334	update: do not overwrite old cpu quota/period Seting CPU quota and period independently does not make much sense, but historically runc allowed it and this needs to be supported to not break compatibility. For systemd cgroup drivers to set CPU quota/period correctly, it needs to know both values. For fs2 cgroup driver to be compatible with the fs driver, it also needs to know both values. Here in update, previously set values are available from config. If only one of {quota,period} is set and the other is not, leave the unset parameter at the old value (don't overwrite config). Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-06-09 17:15:56 -07:00
Kir Kolyshkin	4189cb65f8	cgroups: remove cgroup.Resources.CpuMax This (and the converting function) is only used by one of the four cgroup drivers. The other three do some checking and conversion in place, so let the fs2 do the same. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-06-09 17:15:38 -07:00
John Hwang	7fc291fd45	Replace formatted errors when unneeded Signed-off-by: John Hwang <John.F.Hwang@gmail.com>	2020-05-16 18:13:21 -07:00
Akihiro Suda	aa269315a4	cgroup2: add CpuMax conversion Fix #2243 Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2020-03-13 02:58:39 +09:00
Boris Popovschi	7c439cc6f6	Added conversion for cpu.weight v2 Signed-off-by: Boris Popovschi <zyqsempai@mail.ru>	2020-02-12 11:32:34 +02:00
Xiaochen Shen	1ed597bfe6	libcontainer: intelrdt: add update command support for Intel RDT/MBA Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>	2018-10-16 14:29:34 +08:00
Xiaochen Shen	27560ace2f	libcontainer: intelrdt: add support for Intel RDT/MBA in runc Memory Bandwidth Allocation (MBA) is a resource allocation sub-feature of Intel Resource Director Technology (RDT) which is supported on some Intel Xeon platforms. Intel RDT/MBA provides indirect and approximate throttle over memory bandwidth for the software. A user controls the resource by indicating the percentage of maximum memory bandwidth. Hardware details of Intel RDT/MBA can be found in section 17.18 of Intel Software Developer Manual: https://software.intel.com/en-us/articles/intel-sdm In Linux 4.12 kernel and newer, Intel RDT/MBA is enabled by kernel config CONFIG_INTEL_RDT. If hardware support, CPU flags `rdt_a` and `mba` will be set in /proc/cpuinfo. Intel RDT "resource control" filesystem hierarchy: mount -t resctrl resctrl /sys/fs/resctrl tree /sys/fs/resctrl /sys/fs/resctrl/ \|-- info \| \|-- L3 \| \| \|-- cbm_mask \| \| \|-- min_cbm_bits \| \| \|-- num_closids \| \|-- MB \| \|-- bandwidth_gran \| \|-- delay_linear \| \|-- min_bandwidth \| \|-- num_closids \|-- ... \|-- schemata \|-- tasks \|-- <container_id> \|-- ... \|-- schemata \|-- tasks For MBA support for `runc`, we will reuse the infrastructure and code base of Intel RDT/CAT which implemented in #1279. We could also make use of `tasks` and `schemata` configuration for memory bandwidth resource constraints. The file `tasks` has a list of tasks that belongs to this group (e.g., <container_id>" group). Tasks can be added to a group by writing the task ID to the "tasks" file (which will automatically remove them from the previous group to which they belonged). New tasks created by fork(2) and clone(2) are added to the same group as their parent. The file `schemata` has a list of all the resources available to this group. Each resource (L3 cache, memory bandwidth) has its own line and format. Memory bandwidth schema: It has allocation values for memory bandwidth on each socket, which contains L3 cache id and memory bandwidth percentage. Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..." The minimum bandwidth percentage value for each CPU model is predefined and can be looked up through "info/MB/min_bandwidth". The bandwidth granularity that is allocated is also dependent on the CPU model and can be looked up at "info/MB/bandwidth_gran". The available bandwidth control steps are: min_bw + N * bw_gran. Intermediate values are rounded to the next control step available on the hardware. For more information about Intel RDT kernel interface: https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt An example for runc: Consider a two-socket machine with two L3 caches where the minimum memory bandwidth of 10% with a memory bandwidth granularity of 10%. Tasks inside the container may use a maximum memory bandwidth of 20% on socket 0 and 70% on socket 1. "linux": { "intelRdt": { "memBwSchema": "MB:0=20;1=70" } } Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>	2018-10-16 14:29:29 +08:00
Xiaochen Shen	65918b02a9	intelrdt: add update command support Add runc update command support for Intel RDT/CAT. for example: runc update --l3-cache-schema "L3:0=f;1=f" <container-id> Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>	2017-09-20 01:59:06 +08:00
Justin Cormack	3d9074ead3	Update memory specs to use int64 not uint64 replace #1492 #1494 fix #1422 Since https://github.com/opencontainers/runtime-spec/pull/876 the memory specifications are now `int64`, as that better matches the visible interface where `-1` is a valid value. Otherwise finding the correct value was difficult as it was kernel dependent. Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2017-06-27 12:16:07 +01:00
Michael Crosby	854b41d81e	Update spec to `239c4e44f2` This provides updates to runc for the spec changes with *Process and OOMScoreAdj Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2017-06-01 16:29:47 -07:00
Kenfe-Mickael Laventure	1e7e276aff	Allow updating container pids limit Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>	2017-04-28 06:44:44 -07:00
Qiang Huang	8430cc4f48	Use uint64 for resources to keep consistency with runtime-spec Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2017-03-20 18:51:39 +08:00
Mrunal Patel	4f9cb13b64	Update runtime spec to 1.0.0.rc5 Signed-off-by: Mrunal Patel <mrunalp@gmail.com>	2017-03-15 11:38:37 -07:00
Qiang Huang	6b1d0e76f2	Merge pull request #1127 from boynux/fix-set-mem-to-unlimited Fixes set memory to unlimited	2017-02-16 09:51:23 +08:00
Mohammad Arab	18ebc51b3c	Reset Swap when memory is set to unlimited (-1) Kernel validation fails if memory set to -1 which is unlimited but swap is not set so. Signed-off-by: Mohammad Arab <boynux@gmail.com>	2017-02-15 08:11:57 +01:00
Mrunal Patel	84a3bd250c	Simplify error handling on function return Signed-off-by: Mrunal Patel <mrunalp@gmail.com>	2017-01-06 15:57:31 -08:00
Aleksa Sarai	c6d8a2f26f	merge branch 'pr-1158' Closes #1158 LGTMs: @hqhq @cyphar	2016-12-26 13:59:47 +11:00
Zhang Wei	8eea644ccc	Bump runtime-spec to v1.0.0-rc3 * Bump underlying runtime-spec to version 1.0.0-rc3 * Fix related changed struct names in config.go Signed-off-by: Zhang Wei <zhangwei555@huawei.com>	2016-12-17 14:02:35 +08:00
Zhang Wei	b517076907	Check args numbers before application start Add a general args number validator for all client commands. Signed-off-by: Zhang Wei <zhangwei555@huawei.com>	2016-11-29 11:18:51 +08:00
Zhang Wei	6cd425be2b	Allow update rt_period_us and rt_runtime_us Currently runc already supports setting realtime runtime and period before container processes start, this commit will add update support for realtime scheduler resources. Signed-off-by: Zhang Wei <zhangwei555@huawei.com>	2016-11-04 18:57:22 +08:00
rajasec	2d0d936b76	Small correction in update resource file usage Signed-off-by: rajasec <rajasec79@gmail.com>	2016-10-28 22:58:08 +05:30
Peng Gao	c5393da813	Refactor enum map range to slice range grep -r "range map" showw 3 parts use map to range enum types, use slice instead can get better performance and less memory usage. Signed-off-by: Peng Gao <peng.gao.dut@gmail.com>	2016-09-28 15:36:29 +08:00
Qiang Huang	9ebf816d03	Fix help message for memory-swap Back quotes are the placeholder feature described here: https://github.com/urfave/cli#placeholder-values Without this, cli will take `-1` as default value as: ``` --memory-swap -1 Total memory usage (memory + swap); set `-1` to enable unlimited swap ``` After this patch, it'll act correctly ``` --memory-swap value Total memory usage (memory + swap); set '-1' to enable unlimited swap ``` Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2016-07-21 19:37:36 +08:00
Mrunal Patel	a753b06645	Replace github.com/codegangsta/cli by github.com/urfave/cli The package got moved to a different repository Signed-off-by: Mrunal Patel <mrunalp@gmail.com>	2016-06-06 11:47:20 -07:00
Mrunal Patel	0c5e6e5b27	Merge pull request #851 from hqhq/sync_man_page Update man pages to refect the latest cli change	2016-06-01 13:20:54 -07:00
Qiang Huang	71511dc155	Improve update memory Support update memory with: runc update --memory 50M container-id Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2016-05-30 18:56:10 +08:00

1 2

57 Commits