We intentionally broke this in commit d40b3439a9 ("rootfs: switch to
fd-based handling of mountpoint targets") under the assumption that most
users do not need this feature. Sadly it turns out they do, and so
commit 3f925525b4 ("rootfs: re-allow dangling symlinks in mount
targets") added a hotfix to re-add this functionality.
This patch adds some much-needed tests for this behaviour, since it
seems we are going to need to keep this for compatibility reasons (at
least until runc v2...).
Co-developed-by: lifubang <lifubang@acmcoder.com>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
On some systems (e.g., AlmaLinux 8), systemd automatically removes cgroup paths
when they become empty (i.e., contain no processes). To prevent this, we spawn
a dummy process to pin the cgroup in place.
Fix: https://github.com/opencontainers/runc/issues/5003
Signed-off-by: lifubang <lifubang@acmcoder.com>
This was always the intended behaviour but commit 72fbb34f50 ("rootfs:
switch to fd-based handling of mountpoint targets") regressed it when
adding a mechanism to create a file handle to the target if it didn't
already exist (causing the later stat to always succeed).
A lot of people depend on this functionality, so add some tests to make
sure we don't break it in the future.
Fixes: 72fbb34f50 ("rootfs: switch to fd-based handling of mountpoint targets")
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
This is mostly to improve readability. While at it, make the script more
robust by adding -e option to shell. The exception is echo $pid which is
opportunistic and may fail depending on the order of pids in the file.
Also, remove the empty comment and a shellcheck annotation.
Fixes: c91fe9ae
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
The "runc delete --force [paused container]" test case does not check
runc pause exit code, and if added, the test fails in rootless tests,
because:
- not all rootless tests have access to cgroups;
- rootless containers doesn't have default cgroups path.
To fix, add:
- setup for rootless case;
- require cgroups_freezer;
- runc pause exit code check.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
In our bats tests, runc itself is a wrapper which calls bats run helper,
so using "run runc" is wrong as it results in calling run helper twice.
Fixes: 8d180e965
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Commands that are not run via "run" helper (cat, mkdir, __runc)
do not set $status, so it makes no sense to check it.
Fixes: 94505a04, ed548376
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This is a bit opinionated, but some comments in integration tests do not
really help to understand the nature of the tests being performed by
stating something very obvious, like
# run busybox detached
runc run -d busybox
To make things worse, these not-so-helpful messages are being
copy/pasted over and over, and that is the main reason to remove them.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
1. Remove the devicemapper driver mentions, and is it no longer
supported by docker (or podman).
2. Remove the test example -- we have plenty of real ones.
3. Add a link to (well written and extensive) bats documentation.
4. Fix capitalization in a sentence.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This removes `mips64le` (no longer supported by the image / upstream in Debian Trixie+) and adds `riscv64`.
Signed-off-by: Tianon Gravi <admwiggin@gmail.com>
The main benefit here is when we are using a systemd cgroup driver,
we actually ask systemd to add a PID, rather than doing it ourselves.
This way, we can add rootless exec PID to a cgroup.
This requires newer opencontainers/cgroups and coreos/go-systemd.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
When a non–page-aligned value is written to memory.max, the kernel aligns it
down to the nearest page boundary. On systems with a page size greater
than 4K (e.g., 64K), this caused failures because the configured
memory.max value was not 64K aligned.
This patch fixes the issue by explicitly aligning the memory.max value
to 64K. Since 64K is also a multiple of 4K, the value is correctly
aligned on both 4K and 64K page size systems.
However, this approach will still fail on systems where the hardcoded
memory.max value is not aligned to the system page size.
Fixes: https://github.com/opencontainers/runc/issues/4841
Signed-off-by: Vishal Chourasia <vishalc@linux.ibm.com>
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
1. In case runc binary file name is not runc, the test fails like
below. The fix is to get the binary name from $RUNC.
✗ runc command -h
(in test file tests/integration/help.bats, line 27)
`[[ ${lines[1]} =~ runc\ checkpoint+ ]]' failed
runc-go1.25.0-main checkpoint -h (status=0):
NAME:
runc-go1.25.0-main checkpoint - checkpoint a running container
2. Simplify the test by adding a loop for all commands. While at it, add
a loop for -h --help as well.
3. Add missing commands (create, ps, features).
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
The setup in selinux.bats assumes $RUNC binary name ends in runc, and
thus it fails when we run it like this:
sudo -E RUNC=$(pwd)/runc.patched bats tests/integration/selinux.bats
Fix is easy.
Fixes: b39781b06 ("tests/int: add selinux test case")
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
In certain deployments, it's possible for runc to be spawned by a
process with a restrictive cpumask (such as from a systemd unit with
CPUAffinity=... configured) which will be inherited by runc and thus the
container process by default.
The cpuset cgroup used to reconfigure the cpumask automatically for
joining processes, but kcommit da019032819a ("sched: Enforce user
requested affinity") changed this behaviour in Linux 6.2.
The solution is to try to emulate the expected behaviour by resetting
our cpumask to correspond with the configured cpuset (in the case of
"runc exec", if the user did not configure an alternative one). Normally
we would have to parse /proc/stat and /sys/fs/cgroup, but luckily
sched_setaffinity(2) will transparently convert an all-set cpumask (even
if it has more entries than the number of CPUs on the system) to the
correct value for our usecase.
For some reason, in our CI it seems that rootless --systemd-cgroup
results in the cpuset (presumably temporarily?) being configured such
that sched_setaffinity(2) will allow the full set of CPUs. For this
particular case, all we care about is that it is different to the
original set, so include some special-casing (but we should probably
investigate this further...).
Reported-by: ningmingxiao <ning.mingxiao@zte.com.cn>
Reported-by: Martin Sivak <msivak@redhat.com>
Reported-by: Peter Hunt <pehunt@redhat.com>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Sometimes we need to run runc through some wrapper (like nohup), but
because "__runc" and "runc" are bash functions in our test suite this
doesn't work trivially -- and you cannot just pass "$RUNC" because you
you need to set --root for rootless tests.
So create a setup_runc_cmdline helper which sets $RUNC_CMDLINE to the
beginning cmdline used by __runc (and switch __runc to use that).
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
"runc" was a special wrapper around bats's "run" which output some very
useful diagnostic information to the bats log, but this was not usable
for other commands. So let's make it a more generic helper that we can
use for other commands.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
openSUSE has an unfortunate default udev setup which forcefully sets all
loop devices to use the "none" scheduler, even if you manually set it.
As this is a property of the host configuration (and udev is monitoring
from the host) we cannot really change this behaviour from inside our
test container.
So we should just skip the test in this (hopefully unusual) case.
Ideally tools running the test suite should disable this behaviour when
running our test suite.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
If an error occurs during a test which sets up loopback devices, the
loopback device is not freed. Since most systems have very conservative
limits on the number of loopback devices, re-running a failing test
locally to debug it often ends up erroring out due to loopback device
exhaustion.
So let's just move the "losetup -d" to teardown, where it belongs.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Apparently, having a minor of 0 does not always mean it's the
whole device (not a partition):
=== /proc/partitions (using major: 259) ===
major minor #blocks name
8 16 78643200 sdb
8 17 77593583 sdb1
8 30 4096 sdb14
8 31 108544 sdb15
259 0 934912 sdb16
8 0 78643200 sda
8 1 78641152 sda1
Rewrite the test to not assume minor is 0, and use
lsblk -d to find out whole devices.
This fixes a test case which was added in commit 7696402da.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
The dmem controller is added into kernel v6.13 and is now enabled in
Fedora 42 kernels. Yet, systemd is not aware of dmem.
This fixes the test case failure on Fedora.
For the initial test case, see commit 27515719.
For earlier commits similar to this one, see
commits 601cf582, 05272718, e83ca519.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Implement support for passing Linux Network Devices to the container
network namespace.
The network device is passed during the creation of the container,
before the process is started.
It implements the logic defined in the OCI runtime specification.
Signed-off-by: Antonio Ojea <aojea@google.com>
In case there's a duplicate in the device list, the latter entry
overrides the former one.
So, we need to modify the last entry, not the first one. To do that,
use slices.Backward.
Amend the test case to test the fix.
Reported-by: lifubang <lifubang@acmcoder.com>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This support was missing from runc, and thus the example from the
podman-update wasn't working.
To fix, introduce a function to either update or insert new weights and iops.
Add integration tests.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Instead of providing systemd CPU quota value (CPUQuotaPerSec),
calculate it based on how opencontainers/cgroups/systemd handles
it (see addCPUQuota).
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
For some reason, ssh-keygen is unable to write to /root even as root on
AlmaLinux 8:
# id
uid=0(root) gid=0(root) groups=0(root) context=system_u:system_r:initrc_t:s0
# id -Z
ls -ld /root
# ssh-keygen -t ecdsa -N "" -f /root/rootless.key || cat /var/log/audit/audit.log
Saving key "/root/rootless.key" failed: Permission denied
The audit.log shows:
> type=AVC msg=audit(1744834995.352:546): avc: denied { dac_override } for pid=13471 comm="ssh-keygen" capability=1 scontext=system_u:system_r:ssh_keygen_t:s0 tcontext=system_u:system_r:ssh_keygen_t:s0 tclass=capability permissive=0
> type=SYSCALL msg=audit(1744834995.352:546): arch=c000003e syscall=257 success=no exit=-13 a0=ffffff9c a1=5641c7587520 a2=241 a3=180 items=0 ppid=4978 pid=13471 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="ssh-keygen" exe="/usr/bin/ssh-keygen" subj=system_u:system_r:ssh_keygen_t:s0 key=(null)␝ARCH=x86_64 SYSCALL=openat AUID="unset" UID="root" GID="root" EUID="root" SUID="root" FSUID="root" EGID="root" SGID="root" FSGID="root"
A workaround is to use /root/.ssh directory instead of just /root.
While at it, let's unify rootless user and key setup into a single place.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>