This addresses the following TODO in the code (added back in 2015
by commit 845fc65e5):
> // TODO: fix libcontainer's API to better support uid/gid in a typesafe way.
Historically, libcontainer internally uses strings for user, group, and
additional (aka supplementary) groups.
Yet, runc receives those credentials as part of runtime-spec's process,
which uses integers for all of them (see [1], [2]).
What happens next is:
1. runc start/run/exec converts those credentials to strings (a User
string containing "UID:GID", and a []string for additional GIDs) and
passes those onto runc init.
2. runc init converts them back to int, in the most complicated way
possible (parsing container's /etc/passwd and /etc/group).
All this conversion and, especially, parsing is totally unnecessary,
but is performed on every container exec (and start).
The only benefit of all this is, a libcontainer user could use user and
group names instead of numeric IDs (but runc itself is not using this
feature, and we don't know if there are any other users of this).
Let's remove this back and forth translation, hopefully increasing
runc exec performance.
The only remaining need to parse /etc/passwd is to set HOME environment
variable for a specified UID, in case $HOME is not explicitly set in
process.Env. This can now be done right in prepareEnv, which simplifies
the code flow a lot. Alas, we can not use standard os/user.LookupId, as
it could cache host's /etc/passwd or the current user (even with the
osusergo tag).
PS Note that the structures being changed (initConfig and Process) are
never saved to disk as JSON by runc, so there is no compatibility issue
for runc users.
Still, this is a breaking change in libcontainer, but we never promised
that libcontainer API will be stable (and there's a special package
that can handle it -- github.com/moby/sys/user). Reflect this in
CHANGELOG.
For 3998.
[1]: https://github.com/opencontainers/runtime-spec/blob/v1.0.2/config.md#posix-platform-user
[2]: https://github.com/opencontainers/runtime-spec/blob/v1.0.2/specs-go/config.go#L86
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This is a slight refactor of TestExecInEnvironment, making it more
strict wrt checking the exec output.
1. Explain why DEBUG is added twice to the env.
2. Reuse the execEnv for the check.
3. Make the check more strict -- instead of looking for substrings,
check line by line.
4. Add a check for extra environment variables.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
By definition, every container has only 1 init (i.e. PID 1) process.
Apparently, libcontainer API supported running more than 1 init, and
at least one tests mistakenly used it.
Let's not allow that, erroring out if we already have init. Doing
otherwise _probably_ results in some confusion inside the library.
Fix two cases in libct/int which ran two inits inside a container.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This includes quite a few cleanups and improvements to the way we do
synchronisation. The core behaviour is unchanged, but switching to
embedding json.RawMessage into the synchronisation structure will allow
us to do more complicated synchronisation operations in future patches.
The file descriptor passing through the synchronisation system feature
will be used as part of the idmapped-mount and bind-mount-source
features when switching that code to use the new mount API outside of
nsexec.c.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Currently, TestInit sets up logrus, and init uses it to log an error
from StartInitialization(). This is solely used by TestExecInError
to check that error returned from StartInitialization is the one it
expects.
Note that the very same error is communicated to the runc init parent
and is ultimately returned by container.Run(), so checking what
StartInitialization returned is redundant.
Remove logrus setup and use from TestMain/init.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
1. Use t.TempDir instead of ioutil.TempDir. This means no need for an
explicit cleanup, which removes some code, including newTestBundle
and newTestRoot.
2. Move newRootfs invocation down to newTemplateConfig, removing a need
for explicit rootfs creation. Also, remove rootfs from tParam as it
is no longer needed (there was a since test case in which two
containers shared the same rootfs, but it does not look like it's
required for the test).
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
gofumpt (mvdan.cc/gofumpt) is a fork of gofmt with stricter rules.
Brought to you by
git ls-files \*.go | grep -v ^vendor/ | xargs gofumpt -s -w
Looking at the diff, all these changes make sense.
Also, replace gofmt with gofumpt in golangci.yml.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
1. Do not create the same container named "test" over and over.
2. Fix randomization issues when generating container and cgroup names.
The issues were:
* math/rand used without seeding
* complex rand/md5/hexencode sequence
In both cases, replace with nanosecond time encoded with digits and
lowercase letters.
3. Add test name to container and cgroup names. For example, this is
how systemd log has changed:
Before: Started libcontainer container test16ddfwutxgjte.
After: Started libcontainer container TestPidsSystemd-4oaqvr.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
As buf is instantiated outside the loop, it is appended to,
so if/once an error happens, it contains the output of all previous
iterations. Not a big problem but looks a bit untidy.
Move the declaration to inside the loop.
Fixes: 06a684d6a7
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This reverts most of commit 24c05b7, as otherwise it causes
a few regressions (docker cli, TestDockerSwarmSuite/TestServiceLogsTTY).
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Simplify the tty code by using 1 goroutine instead of 2.
Improve error reporting by wrapping the errors.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
The TestExecInTTY test case is sometimes failing like this:
> execin_test.go:332: unexpected carriage-return in output "PID USER TIME COMMAND\r\n 1 root 0:00 cat\r\n 7 root 0:00 ps\r\n"
or this:
> execin_test.go:332: unexpected carriage-return in output "PID USER TIME COMMAND\r\n 1 root 0:00 cat\n 7 root 0:00 ps\n"
(this is easy to repro with `go test -run TestExecInTTY -count 1000`).
This is caused by a race between
- an Init() (in this case it is is (*linuxSetnsInit.Init(), but
(*linuxStandardInit).Init() is no different in this regard),
which creates a pty pair, sends pty master to runc, and execs
the container process,
and
- a parent runc process, which receives the pty master fd and calls
ClearONLCR() on it.
One way of fixing it would be to add a synchronization mechanism
between these two, so Init() won't exec the process until the parent
sets the flag. This seems excessive, though, as we can just move
the ClearONLCR() call to Init(), putting it right after console.NewPty().
Note that bug only happens in the TestExecInTTY test case, but
from looking at the code it seems like it can happen in runc run
or runc exec, too.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This fixes the following warnings:
> libcontainer/integration/exec_test.go:369:18: S1030: should use stdout.String() instead of string(stdout.Bytes()) (gosimple)
> outputStatus := string(stdout.Bytes())
> ^
> libcontainer/integration/exec_test.go:422:18: S1030: should use stdout.String() instead of string(stdout.Bytes()) (gosimple)
> outputStatus := string(stdout.Bytes())
> ^
> libcontainer/integration/exec_test.go:486:18: S1030: should use stdout.String() instead of string(stdout.Bytes()) (gosimple)
> outputGroups := string(stdout.Bytes())
> ^
> libcontainer/integration/execin_test.go:191:18: S1030: should use stdout.String() instead of string(stdout.Bytes()) (gosimple)
> outputGroups := string(stdout.Bytes())
> ^
> libcontainer/integration/execin_test.go:474:9: S1030: should use stdout.String() instead of string(stdout.Bytes()) (gosimple)
> out := string(stdout.Bytes())
> ^
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
It seems that a few tests add a cgroup mount in case userns is not set.
Let's do it inside newTemplateConfig() for all tests.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
...so we can add more fields later.
This commit is mostly courtesy of
sed -i 's/newTemplateConfig(rootfs)/newTemplateConfig(\&tParam{rootfs: rootfs})/g'
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
There is a race in runc exec when the init process stops just before
the check for the container status. It is then wrongly assumed that
we are trying to start an init process instead of an exec process.
This commit add an Init field to libcontainer Process to distinguish
between init and exec processes to prevent this race.
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
Previously if oomScoreAdj was not set in config.json we would implicitly
set oom_score_adj to 0. This is not allowed according to the spec:
> If oomScoreAdj is not set, the runtime MUST NOT change the value of
> oom_score_adj.
Change this so that we do not modify oom_score_adj if oomScoreAdj is not
present in the configuration. While this modifies our internal
configuration types, the on-disk format is still compatible.
Signed-off-by: Aleksa Sarai <asarai@suse.de>
Handle err return value of fmt.Scanf, os.Pipe and unix.ParseUnixRights.
Found with honnef.co/go/tools/cmd/staticcheck
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
This moves all console code to use github.com/containerd/console library to
handle console I/O. Also move to use EpollConsole by default when user requests
a terminal so we can still cope when the other side temporarily goes away.
Signed-off-by: Daniel Dao <dqminh89@gmail.com>
Updated logrus to use v1 which includes a breaking name change Sirupsen -> sirupsen.
This includes a manual edit of the docker term package to also correct the name there too.
Signed-off-by: Steven Hartland <steven.hartland@multiplay.co.uk>
And use it only in local tooling that is forwarding the pseudoterminal
master. That way runC no longer has an opinion on the onlcr setting
for folks who are creating a terminal and detaching. They'll use
--console-socket and can setup the pseudoterminal however they like
without runC having an opinion. With this commit, the only cases
where runC still has applies SaneTerminal is when *it* is the process
consuming the master descriptor.
Signed-off-by: W. Trevor King <wking@tremily.us>
Since syscall is outdated and broken for some architectures,
use x/sys/unix instead.
There are still some dependencies on the syscall package that will
remain in syscall for the forseeable future:
Errno
Signal
SysProcAttr
Additionally:
- os still uses syscall, so it needs to be kept for anything
returning *os.ProcessState, such as process.Wait.
Signed-off-by: Christy Perez <christy@linux.vnet.ibm.com>
This fixes all of the tests that were broken as part of the console
rewrite. This includes fixing the integration tests that used TTY
handling inside libcontainer, as well as the bats integration tests that
needed to be rewritten to use recvtty (as they rely on detached
containers that are running).
This patch is part of the console rewrite patchset.
Signed-off-by: Aleksa Sarai <asarai@suse.de>
This implements {createTTY, detach} and all of the combinations and
negations of the two that were previously implemented. There are some
valid questions about out-of-OCI-scope topics like !createTTY and how
things should be handled (why do we dup the current stdio to the
process, and how is that not a security issue). However, these will be
dealt with in a separate patchset.
In order to allow for late console setup, split setupRootfs into the
"preparation" section where all of the mounts are created and the
"finalize" section where we pivot_root and set things as ro. In between
the two we can set up all of the console mountpoints and symlinks we
need.
We use two-stage synchronisation to ensures that when the syscalls are
reordered in a suboptimal way, an out-of-place read() on the parentPipe
will not gobble the ancilliary information.
This patch is part of the console rewrite patchset.
Signed-off-by: Aleksa Sarai <asarai@suse.de>
In user namespaces, we need to make sure we don't chown() the console to
unmapped users. This means we need to get both the UID and GID of the
root user in the container when changing the owner.
Signed-off-by: Aleksa Sarai <asarai@suse.de>
Fixes#680
This changes setupRlimit to use the Prlimit syscall (rather than
Setrlimit) and moves the call to the parent process. This is necessary
because Setrlimit would affect the libcontainer consumer if called in
the parent, and would fail if called from the child if the
child process is in a user namespace and the requested rlimit is higher
than that in the parent.
Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com>
Exec erros from the exec() syscall in the container's init should be
treated as if the container ran but couldn't execute the process for the
user instead of returning a libcontainer error as if it was an issue in
the library.
Before specifying different commands like `/etc`, `asldfkjasdlfj`, or
`/alsdjfkasdlfj` would always return 1 on the command line with a
libcontainer specific error message. Now they return the correct
message and exit status defined for unix processes.
Example:
```bash
root@deathstar:/containers/redis# runc start test
exec: "/asdlfkjasldkfj": file does not exist
root@deathstar:/containers/redis# echo $?
127
root@deathstar:/containers/redis# runc start test
exec: "asdlfkjasldkfj": executable file not found in $PATH
root@deathstar:/containers/redis# echo $?
127
root@deathstar:/containers/redis# runc start test
exec: "/etc": permission denied
root@deathstar:/containers/redis# echo $?
126
```
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This rather naively fixes an error observed where a processes stdio
streams are not written to when there is an error upon starting up the
process, such as when the executable doesn't exist within the
container's rootfs.
Before the "fix", when an error occurred on start, `terminate` is called
immediately, which calls `cmd.Process.Kill()`, then calling `Wait()` on
the process. In some cases when this `Kill` is called the stdio stream
have not yet been written to, causing non-deterministic output. The
error itself is properly preserved but users attached to the process
will not see this error.
With the fix it is just calling `Wait()` when an error occurs rather
than trying to `Kill()` the process first. This seems to preserve stdio.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>