Commit Graph

158 Commits

Author SHA1 Message Date
Kir Kolyshkin
5b8f8712a4 libct: signalAllProcesses: remove child reaping
There are two very distinct usage scenarios for signalAllProcesses:

* when used from the runc binary ("runc kill" command), the processes
  that it kills are not the children of "runc kill", and so calling
  wait(2) on each process is totally useless, as it will return ECHLD;

* when used from a program that have created the container (such as
  libcontainer/integration test suite), that program can and should call
  wait(2), not the signalling code.

So, the child reaping code is totally useless in the first case, and
should be implemented by the program using libcontainer in the second
case. I was not able to track down how this code was added, my best
guess is it happened when this code was part of dockerd, which did not
have a proper child reaper implemented at that time.

Remove it, and add a proper documentation piece.

Change the integration test accordingly.

PS the first attempt to disable the child reaping code in
signalAllProcesses was made in commit bb912eb00c, which used a
questionable heuristic to figure out whether wait(2) should be called.
This heuristic worked for a particular use case, but is not correct in
general.

While at it:
 - simplify signalAllProcesses to use unix.Kill;
 - document (container).Signal.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2023-06-08 09:23:29 -07:00
Kir Kolyshkin
eba31a7c6c libct/StartInitialization: rename returned error
This is a cosmetic change to improve code readability, making it easier
to distinguish between a local error and the error being returned.

While at it, rename e to err (it was originally called e to not clash
with returned error named err) and ee to err2.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2023-05-17 12:46:27 -07:00
Kir Kolyshkin
4f0a7e78c3 libct/init: call Init from containerInit
Instead of having newContainerInit return an interface, and let its
caller call Init(), it is easier to call Init directly.

Do that, and rename newContainerInit to containerInit.

I think it makes the code more readable and straightforward.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2023-05-17 12:46:27 -07:00
Kir Kolyshkin
72657eac2e libct: move StartInitialization
No code change, just moving a function from factory_linux.go to
init_linux.go.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2023-05-17 12:46:27 -07:00
Aleksa Sarai
20e38fb2b1 init: do not print environment variable value
When given an environment variable that is invalid, it's not a good idea
to output the contents in case they are supposed to be private (though
such a container wouldn't start anyway so it seems unlikely there's a
real way to use this to exfiltrate environment variables you didn't
already know).

Reported-by: Carl Henrik Lunde <chlunde@ifi.uio.no>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2023-04-28 16:32:15 +10:00
Jaroslav Jindrak
7e5e017dba libcontainer: skip chown of /dev/null caused by fd redirection
In 18c4760a (libct: fixStdioPermissions: skip chown if not needed)
the check whether the STDIO file descriptors point to /dev/null was
removed which can cause /dev/null to change ownership e.g. when using
docker exec on a running container:

$ ls -l /dev/null
crw-rw-rw- 1 root root 1, 3 Aug 1 14:12 /dev/null
$ docker exec -u test 0ad6d3064e9d ls
$ ls -l /dev/null
crw-rw-rw- 1 test root 1, 3 Aug 1 14:12 /dev/null

Signed-off-by: Jaroslav Jindrak <dzejrou@gmail.com>
2023-02-03 01:24:02 +01:00
Walt Chen
18f8f482b1 Fix comment of signalAllProcesses for process wait due to sigkill
Signed-off-by: Walt Chen <godsarmycy@gmail.com>
2022-11-02 10:08:09 -07:00
Kir Kolyshkin
485e6c84e7 Fix some revive warnings
This is needed since the future commits will touch this code, and then
the lint-extra CI job complains.

> libcontainer/factory.go#L245
> var-naming: var fdsJson should be fdsJSON (revive)

and

> libcontainer/init_linux.go#L181
> error-strings: error strings should not be capitalized or end with punctuation or a newline (revive)

and

> notify_socket.go#L94
> receiver-naming: receiver name n should be consistent with previous receiver name s for notifySocket (revive)

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-01-26 19:14:14 -08:00
Kir Kolyshkin
bb6a838876 libct: initContainer: rename Id -> ID
Since the next commit is going to touch this structure, our CI
(lint-extra) is about to complain about improperly named field:

>  Warning: var-naming: struct field ContainerId should be ContainerID (revive)

Make it happy.

Brought to use by gopls rename.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-01-26 18:59:47 -08:00
Kir Kolyshkin
146c8c0c62 libct: fixStdioPermissions: ignore EROFS
In case of a read-only /dev, it's better to move on and let whatever is
run in a container to handle any possible errors.

This solves runc exec for a user with read-only /dev.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-01-21 17:53:03 -08:00
Kir Kolyshkin
18c4760aed libct: fixStdioPermissions: skip chown if not needed
Since we already called fstat, we know the current file uid. In case it
is the same as the one we want it to be, there's no point in trying
chown.

Remove the specific /dev/null check, as the above also covers it
(comparing /dev/null uid with itself is true).

This also fixes runc exec with read-only /dev for root user.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-01-21 17:46:10 -08:00
Kir Kolyshkin
b7fdb68848 libct: fixStdioPermissions: minor refactoring
Use os/file Chown method instead of bare unix.Fchown as it already have
access to underlying fd, and produces nice-looking errors. This allows
us to remove our error wrapping and some linter annotations.

We still use unix.Fstat since os.Stat access to os-specific fields
like uid/gid is not very straightforward. The only change here is to use
file name (rather than fd) in the error text.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-01-21 17:39:50 -08:00
Kir Kolyshkin
dd14040145 libct: fixStdioPermissions: rm config arg
Since commit ff5075c33f it is no longer used.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-11-29 20:10:22 -08:00
Sebastiaan van Stijn
20e928875a Merge pull request #3275 from kolyshkin/rm-prlimit
libcontainer/system: rm Prlimit
2021-11-17 14:38:27 +01:00
Kir Kolyshkin
7563a8f06d libct: wrap more unix errors
When I tried to start a rootless container under a different/wrong user,
I got:

	$ ../runc/runc --systemd-cgroup --root /tmp/runc.$$ run 445
	ERRO[0000] runc run failed: operation not permitted

This is obviously not good enough. With this commit, the error is:

	ERRO[0000] runc run failed: fchown fd 9: operation not permitted

Alas, there are still some code that returns unwrapped errnos from
various unix calls.

This is a followup to commit d8ba4128b2 which wrapped many, but not
all, bare unix errors. Do wrap some more, using either os.PathError or
os.SyscallError.

While at it,
 - use os.SyscallError instead of os.NewSyscallError;
 - use errors.Is(err, os.ErrXxx) instead of os.IsXxx(err).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-11-12 00:33:59 -08:00
Kir Kolyshkin
db4ad6a7f1 libcontainer/system: rm Prlimit
It is now available from golang.org/x/sys/unix
(https://go-review.googlesource.com/c/sys/+/332029)

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-11-11 19:57:56 -08:00
Akihiro Suda
4d17654479 Merge pull request #2576 from kinvolk/alban/userns-2484-take2
Open bind mount sources from the host userns
2021-10-28 14:50:33 +09:00
Kir Kolyshkin
5516294172 Remove io/ioutil use
See https://golang.org/doc/go1.16#ioutil

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-10-14 13:46:02 -07:00
Alban Crequy
9c444070ec Open bind mount sources from the host userns
The source of the bind mount might not be accessible in a different user
namespace because a component of the source path might not be traversed
under the users and groups mapped inside the user namespace. This caused
errors such as the following:

  # time="2020-06-22T13:48:26Z" level=error msg="container_linux.go:367:
  starting container process caused: process_linux.go:459:
  container init caused: rootfs_linux.go:58:
  mounting \"/tmp/busyboxtest/source-inaccessible/dir\"
  to rootfs at \"/tmp/inaccessible\" caused:
  stat /tmp/busyboxtest/source-inaccessible/dir: permission denied"

To solve this problem, this patch performs the following:

1. in nsexec.c, it opens the source path in the host userns (so we have
   the right permissions to open it) but in the container mntns (so the
   kernel cross mntns mount check let us mount it later:
   https://github.com/torvalds/linux/blob/v5.8/fs/namespace.c#L2312).

2. in nsexec.c, it passes the file descriptors of the source to the
   child process with SCM_RIGHTS.

3. In runc-init in Golang, it finishes the mounts while inside the
   userns even without access to the some components of the source
   paths.

Passing the fds with SCM_RIGHTS is necessary because once the child
process is in the container mntns, it is already in the container userns
so it cannot temporarily join the host mntns.

This patch uses the existing mechanism with _LIBCONTAINER_* environment
variables to pass the file descriptors from runc to runc init.

This patch uses the existing mechanism with the Netlink-style bootstrap
to pass information about the list of source mounts to nsexec.c.

Rootless containers don't use this bind mount sources fdpassing
mechanism because we can't setns() to the target mntns in a rootless
container (we don't have the privileges when we are in the host userns).

This patch takes care of using O_CLOEXEC on mount fds, and close them
early.

Fixes: #2484.

Signed-off-by: Alban Crequy <alban@kinvolk.io>
Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
Co-authored-by: Rodrigo Campos <rodrigo@kinvolk.io>
2021-10-12 15:13:45 +02:00
Alban Crequy
2b025c0173 Implement Seccomp Notify
This commit implements support for the SCMP_ACT_NOTIFY action. It
requires libseccomp-2.5.0 to work but runc still works with older
libseccomp if the seccomp policy does not use the SCMP_ACT_NOTIFY
action.

A new synchronization step between runc[INIT] and runc run is introduced
to pass the seccomp fd. runc run fetches the seccomp fd with pidfd_get
from the runc[INIT] process and sends it to the seccomp agent using
SCM_RIGHTS.

As suggested by @kolyshkin, we also make writeSync() a wrapper of
writeSyncWithFd() and wrap the error there. To avoid pointless errors,
we made some existing code paths just return the error instead of
re-wrapping it. If we don't do it, error will look like:

	writing syncT <act>: writing syncT: <err>

By adjusting the code path, now they just look like this
	writing syncT <act>: <err>

Signed-off-by: Alban Crequy <alban@kinvolk.io>
Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
Co-authored-by: Rodrigo Campos <rodrigo@kinvolk.io>
2021-09-07 13:04:24 +02:00
Kir Kolyshkin
9ff64c3d97 *: rm redundant linux build tag
For files that end with _linux.go or _linux_test.go, there is no need to
specify linux build tag, as it is assumed from the file name.

In addition, rename libcontainer/notify_linux_v2.go -> libcontainer/notify_v2_linux.go
for the file name to make sense.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-08-30 20:15:00 -07:00
Kir Kolyshkin
a7cfb23b88 *: stop using pkg/errors
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-06-22 16:09:47 -07:00
Kir Kolyshkin
56e478046a *: ignore errorlint warnings about unix.* errors
Errors from unix.* are always bare and thus can be used directly.

Add //nolint:errorlint annotation to ignore errors such as these:

libcontainer/system/xattrs_linux.go:18:7: comparing with == will fail on wrapped errors. Use errors.Is to check for a specific error (errorlint)
	case errno == unix.ERANGE:
	     ^
libcontainer/container_linux.go:1259:9: comparing with != will fail on wrapped errors. Use errors.Is to check for a specific error (errorlint)
					if e != unix.EINVAL {
					   ^
libcontainer/rootfs_linux.go:919:7: comparing with != will fail on wrapped errors. Use errors.Is to check for a specific error (errorlint)
			if err != unix.EINVAL && err != unix.EPERM {
			   ^
libcontainer/rootfs_linux.go:1002:4: switch on an error will fail on wrapped errors. Use errors.Is to check for specific errors (errorlint)
			switch err {
			^

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-06-22 16:09:47 -07:00
Kir Kolyshkin
f6a0899b7f *: use errors.As and errors.Is
Do this for all errors except one from unix.*.

This fixes a bunch of errorlint warnings, like these

libcontainer/generic_error.go:25:15: type assertion on error will fail on wrapped errors. Use errors.As to check for specific errors (errorlint)
	if le, ok := err.(Error); ok {
	             ^
libcontainer/factory_linux_test.go:145:14: type assertion on error will fail on wrapped errors. Use errors.As to check for specific errors (errorlint)
	lerr, ok := err.(Error)
	            ^
libcontainer/state_linux_test.go:28:11: type assertion on error will fail on wrapped errors. Use errors.As to check for specific errors (errorlint)
	_, ok := err.(*stateTransitionError)
	         ^
libcontainer/seccomp/patchbpf/enosys_linux.go:88:4: switch on an error will fail on wrapped errors. Use errors.Is to check for specific errors (errorlint)
			switch err {
			^

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-06-22 16:09:47 -07:00
Kir Kolyshkin
7be93a66b9 *: fmt.Errorf: use %w when appropriate
This should result in no change when the error is printed, but make the
errors returned unwrappable, meaning errors.As and errors.Is will work.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-06-22 16:09:47 -07:00
Shiming Zhang
322c8fd36b Returns clearer error message for setenv
Signed-off-by: Shiming Zhang <wzshiming@foxmail.com>
2021-06-12 14:32:48 +08:00
Kir Kolyshkin
ff692f289b Fix cgroup2 mount for rootless case
In case of rootless, cgroup2 mount is not possible (see [1] for more
details), so since commit 9c81440fb5 runc bind-mounts the whole
/sys/fs/cgroup into container.

Problem is, if cgroupns is enabled, /sys/fs/cgroup inside the container
is supposed to show the cgroup files for this cgroup, not the root one.

The fix is to pass through and use the cgroup path in case cgroup2
mount failed, cgroupns is enabled, and the path is non-empty.

Surely this requires the /sys/fs/cgroup mount in the spec, so modify
runc spec --rootless to keep it.

Before:

	$ ./runc run aaa
	# find /sys/fs/cgroup/ -type d
	/sys/fs/cgroup
	/sys/fs/cgroup/user.slice
	/sys/fs/cgroup/user.slice/user-1000.slice
	/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service
	...
	# ls -l /sys/fs/cgroup/cgroup.controllers
	-r--r--r--    1 nobody   nogroup          0 Feb 24 02:22 /sys/fs/cgroup/cgroup.controllers
	# wc -w /sys/fs/cgroup/cgroup.procs
	142 /sys/fs/cgroup/cgroup.procs
	# cat /sys/fs/cgroup/memory.current
	cat: can't open '/sys/fs/cgroup/memory.current': No such file or directory

After:

	# find /sys/fs/cgroup/ -type d
	/sys/fs/cgroup/
	# ls -l /sys/fs/cgroup/cgroup.controllers
	-r--r--r--    1 root     root             0 Feb 24 02:43 /sys/fs/cgroup/cgroup.controllers
	# wc -w /sys/fs/cgroup/cgroup.procs
	2 /sys/fs/cgroup/cgroup.procs
	# cat /sys/fs/cgroup/memory.current
	577536

[1] https://github.com/opencontainers/runc/issues/2158

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-04-20 12:35:40 -07:00
Aleksa Sarai
64bb59f592 nsenter: improve debug logging
In order to make 'runc --debug' actually useful for debugging nsexec
bugs, provide information about all the internal operations when in
debug mode.

[@kolyshkin: rebasing; fix formatting via indent for make validate to pass]

Signed-off-by: Aleksa Sarai <asarai@suse.de>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-04-08 09:54:43 -07:00
Peter Hunt
6ce2d63a5d libct/init_linux: retry chdir to fix EPERM
Alas, the EPERM on chdir saga continues...

Unfortunately, the there were two releases between when 5e0e67d76c  was released
and when the workaround https://github.com/opencontainers/runc/pull/2712 was added.

Between this, folks started relying on the ability to have a workdir that the container user doesn't have access to.

Since this case was previously valid, we should continue support for it.

Now, we retry the chdir:
Once at the top of the function (to catch cases where the runc user has access, but container user does not)
and once after we setup user (to catch cases where the container user has access, and the runc user does not)

Add a test case for this as well.

Signed-off-by: Peter Hunt <pehunt@redhat.com>
2021-04-06 14:51:43 -04:00
Kir Kolyshkin
201d60c51d runc run/start/exec: fix init log forwarding race
Sometimes debug.bats test cases are failing like this:

> not ok 27 global --debug to --log --log-format 'json'
> # (in test file tests/integration/debug.bats, line 77)
> #   `[[ "${output}" == *"child process in init()"* ]]' failed

It happens more when writing to disk.

This issue is caused by the fact that runc spawns log forwarding goroutine
(ForwardLogs) but does not wait for it to finish, resulting in missing
debug lines from nsexec.

ForwardLogs itself, though, never finishes, because it reads from a
reading side of a pipe which writing side is not closed. This is
especially true in case of runc create, which spawns runc init and
exits; meanwhile runc init waits on exec fifo for arbitrarily long
time before doing execve.

So, to fix the failure described above, we need to:

 1. Make runc create/run/exec wait for ForwardLogs to finish;

 2. Make runc init close its log pipe file descriptor (i.e.
    the one which value is passed in _LIBCONTAINER_LOGPIPE
    environment variable).

This is exactly what this commit does:

 1. Amend ForwardLogs to return a channel, and wait for it in start().

 2. In runc init, save the log fd and close it as late as possible.

PS I have to admit I still do not understand why an explicit close of
log pipe fd is required in e.g. (*linuxSetnsInit).Init, right before
the execve which (thanks to CLOEXEC) closes the fd anyway.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-03-25 19:18:55 -07:00
Aleksa Sarai
091dd32dd1 merge branch 'pr-2607'
Sebastiaan van Stijn (1):
  libcontainer: move capabilities to separate package

LGTMs: @AkihiroSuda @cyphar
Closes #2607
2021-02-01 14:26:17 +11:00
Kir Kolyshkin
dac0c1e34a console.ClearONLCR: move it back
This reverts most of commit 24c05b7, as otherwise it causes
a few regressions (docker cli, TestDockerSwarmSuite/TestServiceLogsTTY).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-01-19 09:39:54 -08:00
Mrunal Patel
dbbe7e60c7 Merge pull request #2722 from thaJeztah/warn_freeze
libcontainer: signalAllProcesses(): log warning when failing to thaw
2021-01-13 16:42:42 -08:00
Kir Kolyshkin
24c05b71fa tty: fix ClearONLCR race
The TestExecInTTY test case is sometimes failing like this:

> execin_test.go:332: unexpected carriage-return in output "PID USER TIME COMMAND\r\n 1 root 0:00 cat\r\n 7 root 0:00 ps\r\n"

or this:

> execin_test.go:332: unexpected carriage-return in output "PID USER TIME COMMAND\r\n 1 root 0:00 cat\n 7 root 0:00 ps\n"

(this is easy to repro with `go test -run TestExecInTTY -count 1000`).

This is caused by a race between

 - an Init() (in this case it is is (*linuxSetnsInit.Init(), but
   (*linuxStandardInit).Init() is no different in this regard),
   which creates a pty pair, sends pty master to runc, and execs
   the container process,

and

 - a parent runc process, which receives the pty master fd and calls
   ClearONLCR() on it.

One way of fixing it would be to add a synchronization mechanism
between these two, so Init() won't exec the process until the parent
sets the flag. This seems excessive, though, as we can just move
the ClearONLCR() call to Init(), putting it right after console.NewPty().

Note that bug only happens in the TestExecInTTY test case, but
from looking at the code it seems like it can happen in runc run
or runc exec, too.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-01-07 13:33:00 -08:00
Sebastiaan van Stijn
039c47ab82 libcontainer: signalAllProcesses(): log warning when failing to thaw
I noticed this was the only place in this function where we didn't
handle errors on freezing/thawing. Logging as a warning, consistent
with the other cases.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-01-07 11:45:11 +01:00
Sebastiaan van Stijn
1897217731 libcontainer: move capabilities to separate package
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-01-07 11:42:01 +01:00
Peter Hunt
d869d05aba libctr/init_linux: reorder chdir
commit 5e0e67d76c moved the chdir to be one of the
first steps of finalizing the namespace of the container.

However, this causes issues when the cwd is not accessible by the user running runc, but rather
as the container user.

Thus, setupUser has to happen before we call chdir. setupUser still happens before setting the caps,
so the user should be privileged enough to mitigate the issues fixed in 5e0e67d76c

Signed-off-by: Peter Hunt <pehunt@redhat.com>
2020-12-18 12:59:02 -08:00
Sebastiaan van Stijn
e8eb8000f1 fix some linting issues
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-10-02 10:21:54 +02:00
Sebastiaan van Stijn
b3a8b0742c libcontainer: prefer bytes.TrimSpace() over strings.TrimSpace()
Perform trimming before converting to a string, which should be
somewhat more performant.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-10-01 18:36:53 +02:00
tjucoder
ab35cfe23c make sure pty.Close() will be called and fix comment
Signed-off-by: tjucoder <chinesecoder@foxmail.com>
2020-07-05 16:37:21 +08:00
Renaud Gaubert
ccdd75760c Add the CreateRuntime, CreateContainer and StartContainer Hooks
Signed-off-by: Renaud Gaubert <rgaubert@nvidia.com>
2020-06-17 02:10:00 +00:00
John Hwang
7fc291fd45 Replace formatted errors when unneeded
Signed-off-by: John Hwang <John.F.Hwang@gmail.com>
2020-05-16 18:13:21 -07:00
Kir Kolyshkin
af6b9e7fa9 nit: do not use syscall package
In many places (not all of them though) we can use `unix.`
instead of `syscall.` as these are indentical.

In particular, x/sys/unix defines:

```go
type Signal = syscall.Signal
type Errno = syscall.Errno
type SysProcAttr = syscall.SysProcAttr

const ENODEV      = syscall.Errno(0x13)
```

and unix.Exec() calls syscall.Exec().

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-04-18 16:16:49 -07:00
Kurnia D Win
5e0e67d76c fix permission denied
when exec as root and config.Cwd is not owned by root, exec will fail
because root doesn't have the caps.

So, Chdir should be done before setting the caps.

Signed-off-by: Kurnia D Win <kurnia.d.win@gmail.com>
2019-07-18 12:49:36 +07:00
Michael Crosby
aa7917b751 Merge pull request #1911 from theSuess/linter-fixes
Various cleanups to address linter issues
2018-11-13 12:13:34 -05:00
Giuseppe Scrivano
869add3318 rootless: fix running with /proc/self/setgroups set to deny
This is a regression from 06f789cf26
when the user namespace was configured without a privileged helper.
To allow a single mapping in an user namespace, it is necessary to set
/proc/self/setgroups to "deny".

For a simple reproducer, the user namespace can be created with
"unshare -r".

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2018-10-25 15:44:15 +02:00
Dominik Süß
0b412e9482 various cleanups to address linter issues
Signed-off-by: Dominik Süß <dominik@suess.wtf>
2018-10-13 21:14:03 +02:00
Akihiro Suda
06f789cf26 Disable rootless mode except RootlessCgMgr when executed as the root in userns
This PR decomposes `libcontainer/configs.Config.Rootless bool` into `RootlessEUID bool` and
`RootlessCgroups bool`, so as to make "runc-in-userns" to be more compatible with "rootful" runc.

`RootlessEUID` denotes that runc is being executed as a non-root user (euid != 0) in
the current user namespace. `RootlessEUID` is almost identical to the former `Rootless`
except cgroups stuff.

`RootlessCgroups` denotes that runc is unlikely to have the full access to cgroups.
`RootlessCgroups` is set to false if runc is executed as the root (euid == 0) in the initial namespace.
Otherwise `RootlessCgroups` is set to true.
(Hint: if `RootlessEUID` is true, `RootlessCgroups` becomes true as well)

When runc is executed as the root (euid == 0) in an user namespace (e.g. by Docker-in-LXD, Podman, Usernetes),
`RootlessEUID` is set to false but `RootlessCgroups` is set to true.
So, "runc-in-userns" behaves almost same as "rootful" runc except that cgroups errors are ignored.

This PR does not have any impact on CLI flags and `state.json`.

Note about CLI:
* Now `runc --rootless=(auto|true|false)` CLI flag is only used for setting `RootlessCgroups`.
* Now `runc spec --rootless` is only required when `RootlessEUID` is set to true.
  For runc-in-userns, `runc spec`  without `--rootless` should work, when sufficient numbers of
  UID/GID are mapped.

Note about `$XDG_RUNTIME_DIR` (e.g. `/run/user/1000`):
* `$XDG_RUNTIME_DIR` is ignored if runc is being executed as the root (euid == 0) in the initial namespace, for backward compatibility.
  (`/run/runc` is used)
* If runc is executed as the root (euid == 0) in an user namespace, `$XDG_RUNTIME_DIR` is honored if `$USER != "" && $USER != "root"`.
  This allows unprivileged users to allow execute runc as the root in userns, without mounting writable `/run/runc`.

Note about `state.json`:
* `rootless` is set to true when `RootlessEUID == true && RootlessCgroups == true`.

Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
2018-09-07 15:05:03 +09:00
Michael Crosby
fd0febd3ce Wrap error messages during init
Fixes #1437

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2018-05-10 10:28:10 -04:00
Sebastien Boeuf
bb912eb00c libcontainer: Do not wait for signalled processes if subreaper is set
When a subreaper is enabled, it might expect to reap a process and
retrieve its exit code. That's the reason why this patch is giving
the possibility to define the usage of a subreaper as a consumer of
libcontainer. Relying on this information, libcontainer will not
wait for signalled processes in case a subreaper has been set.

Fixes #1677

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2017-12-14 10:37:38 -08:00