Because we have the overlay solution, we can drop runc-dmz binary
solution since it has too many limitations.
Signed-off-by: lifubang <lifubang@acmcoder.com>
The following commands are moved from `contrib/cmd` to `tests/cmd`:
- fs-idmap
- pidfd-kill
- recvtty
- remap-rootfs
- sd-helper
- seccompagent
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
There's too much logic here figuring out which CPUs to use. Runc is a
low level tool and is not supposed to be that "smart". What's worse,
this logic is executed on every exec, making it slower. Some of the
logic in (*setnsProcess).start is executed even if no annotation is set,
thus making ALL execs slow.
Also, this should be a property of a process, rather than annotation.
The plan is to rework this.
This reverts commit afc23e3397.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This handles a corner case when joining a container having all
the processes running exclusively on isolated CPU cores to force
the kernel to schedule runc process on the first CPU core within the
cgroups cpuset.
The introduction of the kernel commit
46a87b3851f0d6eb05e6d83d5c5a30df0eca8f76 has affected this deterministic
scheduling behavior by distributing tasks across CPU cores within the
cgroups cpuset. Some intensive real-time application are relying on this
deterministic behavior and use the first CPU core to run a slow thread
while other CPU cores are fully used by real-time threads with SCHED_FIFO
policy. Such applications prevents runc process from joining a container
when the runc process is randomly scheduled on a CPU core owned by a
real-time thread.
Introduces isolated CPU affinity transition OCI runtime annotation
org.opencontainers.runc.exec.isolated-cpu-affinity-transition to restore
the behavior during runc exec.
Fix issue with kernel >= 6.2 not resetting CPU affinity for container processes.
Signed-off-by: Cédric Clerget <cedric.clerget@gmail.com>
In reviewing PR 4024 ("libct/dmz: Reduce the binary size using nolibc"),
we noticed that we do not intend to actively support MIPS.
We do not intend to support i386 either.
This might be a breaking change for Debian, which has been officially
providing runc packages for `i386`, `mips64el` and `mipsel`:
https://packages.debian.org/bookworm/runc
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Burstable CFS controller is introduced in Linux 5.14. This helps with
parallel workloads that might be bursty. They can get throttled even
when their average utilization is under quota. And they may be latency
sensitive at the same time so that throttling them is undesired.
This feature borrows time now against the future underrun, at the cost
of increased interference against the other system users, by introducing
cfs_burst_us into CFS bandwidth control to enact the cap on unused
bandwidth accumulation, which will then used additionally for burst.
The patch adds the support/control for CFS bandwidth burst.
runtime-spec: https://github.com/opencontainers/runtime-spec/pull/1120
Co-authored-by: Akihiro Suda <suda.kyoto@gmail.com>
Co-authored-by: Nadeshiko Manju <me@manjusaka.me>
Signed-off-by: Kailun Qin <kailun.qin@intel.com>
Apparently, developer.gnome.org/documentation no longer hosts the
documentation we used to refer to. Link to docs.gtk.org instead.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Systemd v252 (available in CentOS Stream 9 in our CI) added support
for setting cpu.idle (see [1]). The way it works is:
- if CPUWeight == 0, cpu.idle is set to 1;
- if CPUWeight != 0, cpu.idle is set to 0.
This commit implements setting cpu.idle in systemd cgroup driver via a
unit property. In case CPUIdle is set to non-zero value, the driver sets
adds CPUWeight=0 property, which will result in systemd setting cpu.idle
to 1.
Unfortunately, there's no way to set cpu.idle to 0 without also changing
the CPUWeight value, so the driver doesn't do anything if CPUIdle is
explicitly set to 0. This case is handled by the fs driver which is
always used as a followup to setting systemd unit properties.
Also, handle cpu.idle set via unified map. In case it is set to non-zero
value, add CPUWeight=0 property, and ignore cpu.weight (otherwise we'll
get two different CPUWeight properties set).
Add a unit test for new values in unified map, and an integration test case.
[1] https://github.com/systemd/systemd/pull/23299
[2] https://github.com/opencontainers/runc/issues/3786
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
The two exceptions I had to add to codespellrc are:
- CLOS (used by intelrtd);
- creat (syscall name used in tests/integration/testdata/seccomp_*.json).
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Explain where the "/dev/tty: no such device or address" error is coming
from, and provide ways to solve the issue.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
1. Move docs/systemd-properties.md to docs/systemd.md
2. Document the cgroupsPath to systemd unit name and slice conversion
rules, as well as mapping of OCI runtime spec resource limits to
systemd unit properties.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
After a lot of refactoring, our cgroup v1 and v2 drivers now have same level of implementation quality,
so we can move the v2 driver out of experimental.
Close#2663
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
This commit removes the unnecessary ampersand.
Especially, it causes the error of "ambiguous redirect" when use bash.
Signed-off-by: Kenta Tada <Kenta.Tada@sony.com>
I realised that the terminal documentation which covers detached
terminals fails to mention that callers need to make themselves a
subreaper. Probably a good idea to mention this. I've also included a
minor comparison to LXC.
Signed-off-by: Aleksa Sarai <asarai@suse.de>
Some systemd properties are documented as having "Sec" suffix
(e.g. "TimeoutStopSec") but are expected to have "USec" suffix
when passed over dbus, so let's provide appropriate conversion
to improve compatibility.
This means, one can specify TimeoutStopSec with a numeric argument,
in seconds, and it will be properly converted to TimeoutStopUsec
with the argument in microseconds. As a side bonus, even float
values are converted, so e.g. TimeoutStopSec=1.5 is possible.
This turned out a bit more tricky to implement when I was
originally expected, since there are a handful of numeric
types in dbus and each one requires explicit conversion.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Users can get very confused by how terminals work with runc, and the
quite confusing "terminal: ..." option. Add a document which goes
through all of the important parts of terminal handling in runc, in the
hopes that we can just point people to this as an explanation.
Signed-off-by: Avi Deitcher <avi@deitcher.net>
[cyphar: quite a large rewrite to fix factual errors and structure]
Co-authored-by: Avi Deitcher <avi@deitcher.net>
Signed-off-by: Aleksa Sarai <asarai@suse.de>