Commit Graph

95 Commits

Author SHA1 Message Date
Kir Kolyshkin
953e56c56f libct/int: runContainer: drop console arg
It is not and was never ever used.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-11-29 20:10:22 -08:00
Kir Kolyshkin
972aea3af0 libct/configs/validate: allow / in sysctl names
Runtime spec says:

> sysctl (object, OPTIONAL) allows kernel parameters to be modified at
> runtime for the container. For more information, see the sysctl(8)
> man page.

and sysctl(8) says:

> variable
>    The name of a key to read from. An example is
>    kernel.ostype. The '/' separator is also accepted in place of a '.'.

Apparently, runc config validator do not support sysctls with / as a
separator. Fortunately this is a one-line fix.

Add some more test data where / is used as a separator.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-10-29 09:45:55 -07:00
Akihiro Suda
95f8ecdd53 fix libcontainer/integration/exec_test.go:1859:8: undefined: ioutil
Fix 4d17654479

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-10-28 14:56:03 +09:00
Akihiro Suda
4d17654479 Merge pull request #2576 from kinvolk/alban/userns-2484-take2
Open bind mount sources from the host userns
2021-10-28 14:50:33 +09:00
Mauricio Vásquez
8542322dfe libcontainer: Add unit tests with userns and mounts
Add a unit test to check that bind mounts that have a part of its
path non accessible by others still work when using user namespaces.

To do this, we also modify newRoot() to return rootfs directories that
can be traverse by others, so the rootfs created works for all test
(either running in a userns or not).

Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io>
Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
Co-authored-by: Rodrigo Campos <rodrigo@kinvolk.io>
2021-10-16 17:29:33 +02:00
Kir Kolyshkin
5516294172 Remove io/ioutil use
See https://golang.org/doc/go1.16#ioutil

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-10-14 13:46:02 -07:00
Kir Kolyshkin
3bc606e9d3 libct/int: adapt to Go 1.15
1. Use t.TempDir instead of ioutil.TempDir. This means no need for an
   explicit cleanup, which removes some code, including newTestBundle
   and newTestRoot.

2. Move newRootfs invocation down to newTemplateConfig, removing a need
   for explicit rootfs creation. Also, remove rootfs from tParam as it
   is no longer needed (there was a since test case in which two
   containers shared the same rootfs, but it does not look like it's
   required for the test).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-07-27 01:41:47 -07:00
Kir Kolyshkin
5dc3260431 libct/int/TestFreeze: test freeze/thaw via Set
In addition to freezing and thawing a container via Pause/Resume,
there is a way to also do so via Set.

This way was broken though and is being fixed by a few preceding
commits. The test is added to make sure this is fixed and won't regress.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-07-14 23:42:35 -07:00
Kir Kolyshkin
e969d42156 libct/int/testPids: logging nits
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-06-02 17:31:03 -07:00
Kir Kolyshkin
e6048715e4 Use gofumpt to format code
gofumpt (mvdan.cc/gofumpt) is a fork of gofmt with stricter rules.

Brought to you by

	git ls-files \*.go | grep -v ^vendor/ | xargs gofumpt -s -w

Looking at the diff, all these changes make sense.

Also, replace gofmt with gofumpt in golangci.yml.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-06-01 12:17:27 -07:00
Aleksa Sarai
ed4781029f merge branch 'pr-2781'
Sebastiaan van Stijn (7):
  errcheck: utils
  errcheck: signals
  errcheck: tty
  errcheck: libcontainer
  errcheck: libcontainer/nsenter
  errcheck: libcontainer/configs
  errcheck: libcontainer/integration

LGTM: AkihiroSuda cyphar
Closes #2781
2021-05-25 12:31:52 +10:00
Aleksa Sarai
54904516e6 libcontainer: fix integration failure in "make test"
When running inside a Docker container, systemd is not available. The
new TestFdLeaksSystemd forgot to include the relevant t.Skip section.

Fixes: a7feb42395 ("libct/int: add TestFdLeaksSystemd")
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2021-05-23 17:55:09 +10:00
Aleksa Sarai
c7c70ce810 *: clean t.Skip messages
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2021-05-23 17:53:01 +10:00
Sebastiaan van Stijn
a899505377 errcheck: libcontainer/integration
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-05-20 14:17:40 +02:00
Kir Kolyshkin
a7feb42395 libct/int: add TestFdLeaksSystemd
Add a test to check that container.Run do not leak file descriptors.

Before the previous commit, it fails like this:

    exec_test.go:2030: extra fd 8 -> socket:[659703]
    exec_test.go:2030: extra fd 11 -> socket:[658715]
    exec_test.go:2033: found 2 extra fds after container.Run

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-05-06 12:37:55 -07:00
Kir Kolyshkin
6faed0e486 libct/int: use ok(t, err)
... in all the places it makes sense to use it.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-04-15 13:03:17 -07:00
Kir Kolyshkin
af3c5699a5 libct/int: remove unused code
Since commit 88e8350de2 the error message is different, so the check
is not working. In addition, for the cgroup v2 case, and it seems that
PID controller is always available these days.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-04-15 12:46:17 -07:00
Kir Kolyshkin
7b802a7da4 libct/int: better test container names
1. Do not create the same container named "test" over and over.

2. Fix randomization issues when generating container and cgroup names.
   The issues were:

    * math/rand used without seeding
    * complex rand/md5/hexencode sequence

   In both cases, replace with nanosecond time encoded with digits and
   lowercase letters.

3. Add test name to container and cgroup names. For example, this is
   how systemd log has changed:

   Before: Started libcontainer container test16ddfwutxgjte.
   After: Started libcontainer container TestPidsSystemd-4oaqvr.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-04-15 12:37:59 -07:00
Qiang Huang
2d38476c96 Merge pull request #2840 from kolyshkin/ignore-kmem
Ignore kernel memory settings
2021-04-13 09:44:14 +08:00
Kir Kolyshkin
52390d6804 Ignore kernel memory settings
This is somewhat radical approach to deal with kernel memory.

Per-cgroup kernel memory limiting was always problematic. A few
examples:

 - older kernels had bugs and were even oopsing sometimes (best example
   is RHEL7 kernel);
 - kernel is unable to reclaim the kernel memory so once the limit is
   hit a cgroup is toasted;
 - some kernel memory allocations don't allow failing.

In addition to that,

 - users don't have a clue about how to set kernel memory limits
   (as the concept is much more complicated than e.g. [user] memory);
 - different kernels might have different kernel memory usage,
   which is sort of unexpected;
 - cgroup v2 do not have a [dedicated] kmem limit knob, and thus
   runc silently ignores kernel memory limits for v2;
 - kernel v5.4 made cgroup v1 kmem.limit obsoleted (see
   https://github.com/torvalds/linux/commit/0158115f702b).

In view of all this, and as the runtime-spec lists memory.kernel
and memory.kernelTCP as OPTIONAL, let's ignore kernel memory
limits (for cgroup v1, same as we're already doing for v2).

This should result in less bugs and better user experience.

The only bad side effect from it might be that stat can show kernel
memory usage as 0 (since the accounting is not enabled).

[v2: add a warning in specconv that limits are ignored]

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-04-12 12:18:11 -07:00
Kir Kolyshkin
79a8647b81 libct/int: add TestFdLeaks
This is a very simple test that checks that container.Run do not leak
opened file descriptors.

In fact it does, so we have to add two exclusions:

1. /sys/fs/cgroup is opened once per lifetime in prepareOpenat2(),
    provided that cgroupv2 is used and openat2 is available. This
    works as intended ("it's not a bug, it's a feature").

2. ebpf program fd is leaked every time we call setDevices() for
   cgroupv2 (iow, every container.Run or container.Set leaks 1 fd).
   This needs to be fixed, thus FIXME.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-03-30 19:58:09 -07:00
Kir Kolyshkin
3de5c51454 libct/int: don't hardcode CAP_NET_ADMIN
... use the one from unix instead.

Coincidentally, this fixes this warning from gosimple linter:

> libcontainer/integration/exec_test.go:448:2: S1021: should merge variable declaration with assignment on next line (gosimple)
>	var netAdminBit uint
>	^

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-12-03 10:24:27 -08:00
Kir Kolyshkin
3387422bf9 libct/int: fix "simple" linter warnings
This fixes the following warnings:

> libcontainer/integration/exec_test.go:369:18: S1030: should use stdout.String() instead of string(stdout.Bytes()) (gosimple)
>	outputStatus := string(stdout.Bytes())
>	                ^
> libcontainer/integration/exec_test.go:422:18: S1030: should use stdout.String() instead of string(stdout.Bytes()) (gosimple)
>	outputStatus := string(stdout.Bytes())
>	                ^
> libcontainer/integration/exec_test.go:486:18: S1030: should use stdout.String() instead of string(stdout.Bytes()) (gosimple)
>	outputGroups := string(stdout.Bytes())
>	                ^
> libcontainer/integration/execin_test.go:191:18: S1030: should use stdout.String() instead of string(stdout.Bytes()) (gosimple)
>	outputGroups := string(stdout.Bytes())
>	                ^
> libcontainer/integration/execin_test.go:474:9: S1030: should use stdout.String() instead of string(stdout.Bytes()) (gosimple)
>	out := string(stdout.Bytes())
>	       ^

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-12-03 10:24:27 -08:00
Mauricio Vásquez
ac5ec5e32f libcontainer/integration: fix unit test
Fix a merge issue between 0aa0fae393 ("Kill all processes in cgroup even if init process Wait fails")
& 73d93eeb01 ("libct/int: make newTemplateConfig argument a struct") that
resulted in passing the wrong datatype to newTemplateConfig in
TestPIDHostInitProcessWait.

Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io>
2020-10-23 07:56:11 -05:00
Mrunal Patel
07e35a7a40 Merge pull request #2600 from kolyshkin/libct-int-wut
libcontainer/integration: fix cgroupv1 + systemd tests
2020-10-22 21:05:15 -07:00
Kir Kolyshkin
44f221e2fc Merge pull request #2633 from chaitanyabandi/2632-fix
Kill processes in cgroup even if process Wait fails
2020-10-08 10:51:13 -07:00
Chaitanya Bandi
0aa0fae393 Kill all processes in cgroup even if init process Wait fails
If the cgroup's init process doesn't complete successfully, Wait returns a
non-nil error. We should still kill all the process in the cgroup if process
namespace is shared. Otherwise, it may result in process leak.

Fixes #2632

Signed-off-by: Chaitanya Bandi <kbandi@cs.stonybrook.edu>
2020-10-08 01:26:34 +00:00
Amim Knabben
978fa6e906 Fixing some lint issues
Signed-off-by: Amim Knabben <amim.knabben@gmail.com>
2020-10-06 14:44:14 -04:00
Kir Kolyshkin
fa47f95872 libct/int/newTemplateConfig: add systemd support
... and properly set ScopePrefix, Name and Parent for a systemd case.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-10-05 20:51:02 -07:00
Kir Kolyshkin
9135d99c94 libct/int/newTemplateConfig: add userns param
It seems that a few tests add a cgroup mount in case userns is not set.
Let's do it inside newTemplateConfig() for all tests.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-10-05 20:51:02 -07:00
Kir Kolyshkin
73d93eeb01 libct/int: make newTemplateConfig argument a struct
...so we can add more fields later.

This commit is mostly courtesy of

sed -i 's/newTemplateConfig(rootfs)/newTemplateConfig(\&tParam{rootfs: rootfs})/g'

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-10-05 20:51:02 -07:00
Sebastiaan van Stijn
28b452bf65 libcontainer: unconvert
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-10-01 18:36:56 +02:00
Sebastiaan van Stijn
b3a8b0742c libcontainer: prefer bytes.TrimSpace() over strings.TrimSpace()
Perform trimming before converting to a string, which should be
somewhat more performant.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-10-01 18:36:53 +02:00
Sebastiaan van Stijn
8bf216728c use string-concatenation instead of sprintf for simple cases
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-09-30 10:51:59 +02:00
Kir Kolyshkin
b006f4a180 libct/cgroups: support Cgroups.Resources.Unified
Add support for unified resource map (as per [1]), and add some test
cases for the new functionality.

[1] https://github.com/opencontainers/runtime-spec/pull/1040

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-09-24 15:29:35 -07:00
Kir Kolyshkin
2de0b5aaf3 libct/integration: enable some tests for cgroupv2
The only two tests that are still skipped on v2 are kmem
and invalid CpuShares test -- since v2 does not support either.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-08-08 19:19:31 -07:00
Renaud Gaubert
2f7bdf9d3b Tests the new Hook
Signed-off-by: Renaud Gaubert <rgaubert@nvidia.com>
2020-06-19 02:39:20 +00:00
Mrunal Patel
33c6125da6 systemd: Export IsSystemdRunning() function
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2020-03-30 15:24:06 -07:00
Boris Popovschi
3b992087b8 Fix skip message for cgroupv2
Signed-off-by: Boris Popovschi <zyqsempai@mail.ru>
2020-02-03 14:27:12 +02:00
Akihiro Suda
ccd4436fc4 .travis.yml: add Fedora 31 vagrant box (for cgroup2)
As the baby step, only unit tests are executed.

Failing tests are currently skipped and will be fixed in follow-up PRs.

Fix #2124

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2019-10-31 16:53:01 +09:00
Aleksa Sarai
565325fc36 integration: fix mis-use of libcontainer.Factory
For some reason, libcontainer/integration has a whole bunch of incorrect
usages of libcontainer.Factory -- causing test failures with a set of
security patches that will be published soon. Fixing ths is fairly
trivial (switch to creating a new libcontainer.Factory once in each
process, rather than creating one in TestMain globally).

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2019-01-24 23:12:48 +13:00
W. Trevor King
e23868603a libcontainer: Set 'status' in hook stdin
Finish off the work started in a344b2d6 (sync up `HookState` with OCI
spec `State`, 2016-12-19, #1201).

And drop HookState, since there's no need for a local alias for
specs.State.

Also set c.initProcess in newInitProcess to support OCIState calls
from within initProcess.start().  I think the cyclic references
between linuxContainer and initProcess are unfortunate, but didn't
want to address that here.

I've also left the timing of the Prestart hooks alone, although the
spec calls for them to happen before start (not as part of creation)
[1,2].  Once the timing gets fixed we can drop the
initProcessStartTime hacks which initProcess.start currently needs.

I'm not sure why we trigger the prestart hooks in response to both
procReady and procHooks.  But we've had two prestart rounds in
initProcess.start since 2f276498 (Move pre-start hooks after container
mounts, 2016-02-17, #568).  I've left that alone too.

I really think we should have len() guards to avoid computing the
state when .Hooks is non-nil but the particular phase we're looking at
is empty.  Aleksa, however, is adamantly against them [3] citing a
risk of sloppy copy/pastes causing the hook slice being len-guarded to
diverge from the hook slice being iterated over within the guard.  I
think that ort of thing is very lo-risk, because:

* We shouldn't be copy/pasting this, right?  DRY for the win :).
* There's only ever a few lines between the guard and the guarded
  loop.  That makes broken copy/pastes easy to catch in review.
* We should have test coverage for these.  Guarding with the wrong
  slice is certainly not the only thing you can break with a sloppy
  copy/paste.

But I'm not a maintainer ;).

[1]: https://github.com/opencontainers/runtime-spec/blob/v1.0.0/config.md#prestart
[2]: https://github.com/opencontainers/runc/issues/1710
[3]: https://github.com/opencontainers/runc/pull/1741#discussion_r233331570

Signed-off-by: W. Trevor King <wking@tremily.us>
2018-11-14 06:49:49 -08:00
Mrunal Patel
4769cdf607 Merge pull request #1916 from crosbymichael/cgns
Add support for cgroup namespace
2018-11-13 12:21:38 -08:00
Yuanhong Peng
df3fa115f9 Add support for cgroup namespace
Cgroup namespace can be configured in `config.json` as other
namespaces. Here is an example:

```
"namespaces": [
	{
		"type": "pid"
	},
	{
		"type": "network"
	},
	{
		"type": "ipc"
	},
	{
		"type": "uts"
	},
	{
		"type": "mount"
	},
	{
		"type": "cgroup"
	}
],

```

Note that if you want to run a container which has shared cgroup ns with
another container, then it's strongly recommended that you set
proper `CgroupsPath` of both containers(the second container's cgroup
path must be the subdirectory of the first one). Or there might be
some unexpected results.

Signed-off-by: Yuanhong Peng <pengyuanhong@huawei.com>
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2018-10-31 10:51:43 -04:00
Dominik Süß
0b412e9482 various cleanups to address linter issues
Signed-off-by: Dominik Süß <dominik@suess.wtf>
2018-10-13 21:14:03 +02:00
Mrunal Patel
bd3c4f844a Fix race in runc exec
There is a race in runc exec when the init process stops just before
the check for the container status. It is then wrongly assumed that
we are trying to start an init process instead of an exec process.

This commit add an Init field to libcontainer Process to distinguish
between init and exec processes to prevent this race.

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2018-06-01 16:25:58 -07:00
Aleksa Sarai
fd3a6e6c83 libcontainer: handle unset oomScoreAdj corectly
Previously if oomScoreAdj was not set in config.json we would implicitly
set oom_score_adj to 0. This is not allowed according to the spec:

> If oomScoreAdj is not set, the runtime MUST NOT change the value of
> oom_score_adj.

Change this so that we do not modify oom_score_adj if oomScoreAdj is not
present in the configuration. While this modifies our internal
configuration types, the on-disk format is still compatible.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2018-03-17 13:53:42 +11:00
Tobias Klauser
24a4273cf9 libcontainer: handle error cases
Handle err return value of fmt.Scanf, os.Pipe and unix.ParseUnixRights.

Found with honnef.co/go/tools/cmd/staticcheck

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
2017-07-28 15:13:11 +02:00
Christy Perez
3d7cb4293c Move libcontainer to x/sys/unix
Since syscall is outdated and broken for some architectures,
use x/sys/unix instead.

There are still some dependencies on the syscall package that will
remain in syscall for the forseeable future:

Errno
Signal
SysProcAttr

Additionally:
- os still uses syscall, so it needs to be kept for anything
returning *os.ProcessState, such as process.Wait.

Signed-off-by: Christy Perez <christy@linux.vnet.ibm.com>
2017-05-22 17:35:20 -05:00
Daniel Dao
09c72cea69 fix panic regression when config doesnt have caps
When process config doesnt specify capabilities anywhere, we should not panic
because setting capabilities are optional.

Signed-off-by: Daniel Dao <dqminh89@gmail.com>
2017-03-21 00:45:26 +00:00