I don't want to implement it now, because this might result in some
new issues, but this is definitely something that is worth implementing.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
The v6.0.0 release of go-criu has deprecated the `rpc` package in favour
of the `crit` package. This commit provides the changes required to use
this version in runc.
Signed-off-by: Prajwal S N <prajwalnadig21@gmail.com>
The only implementation of these is linuxContainer. It does not make
sense to have an interface with a single implementation, and we do not
foresee other types of containers being added to runc.
Remove BaseContainer and Container interfaces, moving their methods
documentation to linuxContainer.
Rename linuxContainer to Container.
Adopt users from using interface to using struct.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Function strings.Title is deprecated as of Go 1.18, because it does not
handle some corner cases good enough. In this case, though, it is
perfectly fine to use it since we have a single ASCII word as an
argument, and strings.Title won't be removed until at least Go 2.0.
Suppress the deprecation warning.
The alternative is to not capitalize the namespace string; this will break
restoring of a container checkpointed by earlier version of runc.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Remove intelrtd.Manager interface, since we only have a single
implementation, and do not expect another one.
Rename intelRdtManager to Manager, and modify its users accordingly.
Remove NewIntelRdtManager from factory.
Remove IntelRdtfs. Instead, make intelrdt.NewManager return nil if the
feature is not available.
Remove TestFactoryNewIntelRdt as it is now identical to TestFactoryNew.
Add internal function newManager to be used for tests (to make sure
some testing is done even when the feature is not available in
kernel/hardware).
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Since we are looking up the path to newuidmap/newgidmap in one context,
and executing those in another (libct/nsenter), it might make sense to
use a stricter rules for looking up path to those binaries.
Practically it means that if someone wants to use custom newuidmap and
newgidmap binaries from $PATH, it would be impossible to use these from
the current directory by means of PATH=.:$PATH; instead one would have
to do something like PATH=$(pwd):$PATH.
See https://go.dev/blog/path-security for background.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
These were introduced in commit d8b669400 back in 2017, with a TODO
of "make binary names configurable". Apparently, everyone is happy with
the hardcoded names. In fact, they *are* configurable (by prepending the
PATH with a directory containing own version of newuidmap/newgidmap).
Now, these binaries are only needed in a few specific cases (when
rootless is set etc.), so let's look them up only when needed.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Those are *always* /proc/self/exe init, and it does not make sense
to ever change these. More to say, if InitArgs option func (removed
by this commit) is used to change these parameters, it will break
things, since "init" is hardcoded elsewhere.
Remove this.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This was introduced in an initial commit, back in the day when criu was
a highly experimental thing. Today it's not; most users who need it have
it packaged by their distro vendor.
The usual way to run a binary is to look it up in directories listed in
$PATH. This is flexible enough and allows for multiple scenarios (custom
binaries, extra binaries, etc.). This is the way criu should be run.
Make --criu a hidden option (thus removing it from help). Remove the
option from man pages, integration tests, etc. Remove all traces of
CriuPath from data structures.
Add a warning that --criu is ignored and will be removed.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Since the next commit is going to touch this structure, our CI
(lint-extra) is about to complain about improperly named field:
> Warning: var-naming: struct field ContainerId should be ContainerID (revive)
Make it happy.
Brought to use by gopls rename.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
The utils.Annotations was used here before only because it made it
possible to distinguish between "key not found" and "empty value" cases.
With the previous commit, utils.SearchLabels can do that, and so it
makes sense to use it.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
When writing netlink messages, it is possible to have a byte array
larger than UINT16_MAX which would result in the length field
overflowing and allowing user-controlled data to be parsed as control
characters (such as creating custom mount points, changing which set of
namespaces to allow, and so on).
Co-authored-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Using null bytes as control characters for sending strings via netlink
opens us up to a user explicitly putting a null byte in a mount string
(which JSON will happily let you do) and then causing us to open a mount
path different to the one expected.
In practice this is more of an issue in an environment such as
Kubernetes where you may have path-based access control policies (which
are more susceptible to these kinds of flaws).
Found by Google Project Zero.
Fixes: 9c444070ec ("Open bind mount sources from the host userns")
Reported-by: Felix Wilhelm <fwilhelm@google.com>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
The source of the bind mount might not be accessible in a different user
namespace because a component of the source path might not be traversed
under the users and groups mapped inside the user namespace. This caused
errors such as the following:
# time="2020-06-22T13:48:26Z" level=error msg="container_linux.go:367:
starting container process caused: process_linux.go:459:
container init caused: rootfs_linux.go:58:
mounting \"/tmp/busyboxtest/source-inaccessible/dir\"
to rootfs at \"/tmp/inaccessible\" caused:
stat /tmp/busyboxtest/source-inaccessible/dir: permission denied"
To solve this problem, this patch performs the following:
1. in nsexec.c, it opens the source path in the host userns (so we have
the right permissions to open it) but in the container mntns (so the
kernel cross mntns mount check let us mount it later:
https://github.com/torvalds/linux/blob/v5.8/fs/namespace.c#L2312).
2. in nsexec.c, it passes the file descriptors of the source to the
child process with SCM_RIGHTS.
3. In runc-init in Golang, it finishes the mounts while inside the
userns even without access to the some components of the source
paths.
Passing the fds with SCM_RIGHTS is necessary because once the child
process is in the container mntns, it is already in the container userns
so it cannot temporarily join the host mntns.
This patch uses the existing mechanism with _LIBCONTAINER_* environment
variables to pass the file descriptors from runc to runc init.
This patch uses the existing mechanism with the Netlink-style bootstrap
to pass information about the list of source mounts to nsexec.c.
Rootless containers don't use this bind mount sources fdpassing
mechanism because we can't setns() to the target mntns in a rootless
container (we don't have the privileges when we are in the host userns).
This patch takes care of using O_CLOEXEC on mount fds, and close them
early.
Fixes: #2484.
Signed-off-by: Alban Crequy <alban@kinvolk.io>
Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
Co-authored-by: Rodrigo Campos <rodrigo@kinvolk.io>
In some setups, multiple cgroups are used inside a container,
and sometime there is a need to execute a process in a particular
sub-cgroup (in case of cgroup v1, for a particular controller).
This is what this commit implements.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Wire through CRIU's support to change the mount context on restore.
This is especially useful if restoring a container in a different pod.
Single container restore uses the same SELinux process label and
same mount context as during checkpointing. If a container is being
restored into an existing pod the process label and the mount context
needs to be changed to the context of the pod.
Changing process label on restore is already supported by runc. This
patch adds the possibility to change the mount context.
Signed-off-by: Adrian Reber <areber@redhat.com>
runc delete -f is not working for a paused container, since in cgroup v1
SIGKILL does nothing if a process is frozen (unlike cgroup v2, in which
you can kill a frozen process with a fatal signal).
Theoretically, we only need this for v1, but doing it for v2 as well is
OK.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
For files that end with _linux.go or _linux_test.go, there is no need to
specify linux build tag, as it is assumed from the file name.
In addition, rename libcontainer/notify_linux_v2.go -> libcontainer/notify_v2_linux.go
for the file name to make sense.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
ConfigError was added by commit e918d02139, while removing runc own
error system, to preserve a way for a libcontainer user to distinguish
between a configuration error and something else.
The way ConfigError is implemented requires a different type of check
(compared to all other errors defined by error.go). An attempt was made
to rectify this, but the resulting code became even more complicated.
As no one is using this functionality (of differentiating a "bad config"
type of error from other errors), let's just drop the ConfigError type.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
runc resolves symlink before doing bind mount. So
we should save original path while formatting CriuReq for
dump and restore.
"checkpoint: resolve symlink for external bind mount" is merged as
da22625f6986f0ef196eaa1f8bb6adce098f0fb7(PR 2902) previously. And reverted
in commit 70fdc0573dced3464e9c31d674559f77c1de3973(PR 3043) duo to behavior changes
caused by commit 0ca91f44f1664da834bc61115a849b56d22f595f(Fixes: CVE-2021-30465)
Signed-off-by: Liu Hua <weldonliu@tencent.com>
The two exceptions I had to add to codespellrc are:
- CLOS (used by intelrtd);
- creat (syscall name used in tests/integration/testdata/seccomp_*.json).
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This removes libcontainer's own error wrapping system, consisting of a
few types and functions, aimed at typization, wrapping and unwrapping
of errors, as well as saving error stack traces.
Since Go 1.13 now provides its own error wrapping mechanism and a few
related functions, it makes sense to switch to it.
While doing that, improve some error messages so that they start
with "error", "unable to", or "can't".
A few things that are worth mentioning:
1. We lose stack traces (which were never shown anyway).
2. Users of libcontainer that relied on particular errors (like
ContainerNotExists) need to switch to using errors.Is with
the new errors defined in error.go.
3. encoding/json is unable to unmarshal the built-in error type,
so we have to introduce initError and wrap the errors into it
(basically passing the error as a string). This is the same
as it was before, just a tad simpler (actually the initError
is a type that got removed in commit afa844311; also suddenly
ierr variable name makes sense now).
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Errors from unix.* are always bare and thus can be used directly.
Add //nolint:errorlint annotation to ignore errors such as these:
libcontainer/system/xattrs_linux.go:18:7: comparing with == will fail on wrapped errors. Use errors.Is to check for a specific error (errorlint)
case errno == unix.ERANGE:
^
libcontainer/container_linux.go:1259:9: comparing with != will fail on wrapped errors. Use errors.Is to check for a specific error (errorlint)
if e != unix.EINVAL {
^
libcontainer/rootfs_linux.go:919:7: comparing with != will fail on wrapped errors. Use errors.Is to check for a specific error (errorlint)
if err != unix.EINVAL && err != unix.EPERM {
^
libcontainer/rootfs_linux.go:1002:4: switch on an error will fail on wrapped errors. Use errors.Is to check for specific errors (errorlint)
switch err {
^
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This should result in no change when the error is printed, but make the
errors returned unwrappable, meaning errors.As and errors.Is will work.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Errors returned by unix are bare. In some cases it's impossible to find
out what went wrong because there's is not enough context.
Add a mountError type (mostly copy-pasted from github.com/moby/sys/mount),
and mount/unmount helpers. Use these where appropriate, and convert error
checks to use errors.Is.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Using fmt.Errorf for errors that do not have %-style formatting
directives is an overkill. Switch to errors.New.
Found by
git grep fmt.Errorf | grep -v ^vendor | grep -v '%'
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Now runc puts dump/restore logs in c.root defaultly, which will be deleted
when container exits. So if checkpinting/restoring failed, we can not get
these logs and analyze why.
This patch lets criu use its default if --work-path is not set:
- Use WorkDirectory found in criu's configfile.
- Use ImageDirectory.
Signed-off-by: Liu Hua <weldonliu@tencent.com>
Restoring an SELinux enabled container with Podman will result in
a container with the exactly same SELinux process labels as during
checkpointing. CRIU takes care of all the process labels.
Restoring multiple copies of a checkpointed container will result in all
containers having the same SELinux process labels, which might be
undesired.
When looking at Pods all container in a Pod share the process label
of the infrastructure container. To restore a container into and
existing Pod it is necessary to tell CRIU to restore the container
with the infrastructure container process label.
CRIU supports setting different process labels using --lsm-profile for a
long time and this just passes the process label information from runc
to CRIU.
Unfortunately CRIU has a bug as no one was using the --lsm-profile
option so this changes requires the upcoming CRIU version 3.16.
Signed-off-by: Adrian Reber <areber@redhat.com>
gofumpt (mvdan.cc/gofumpt) is a fork of gofmt with stricter rules.
Brought to you by
git ls-files \*.go | grep -v ^vendor/ | xargs gofumpt -s -w
Looking at the diff, all these changes make sense.
Also, replace gofmt with gofumpt in golangci.yml.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>