Commit Graph

81 Commits

Author SHA1 Message Date
Kir Kolyshkin
a7feb42395 libct/int: add TestFdLeaksSystemd
Add a test to check that container.Run do not leak file descriptors.

Before the previous commit, it fails like this:

    exec_test.go:2030: extra fd 8 -> socket:[659703]
    exec_test.go:2030: extra fd 11 -> socket:[658715]
    exec_test.go:2033: found 2 extra fds after container.Run

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-05-06 12:37:55 -07:00
Kir Kolyshkin
6faed0e486 libct/int: use ok(t, err)
... in all the places it makes sense to use it.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-04-15 13:03:17 -07:00
Kir Kolyshkin
af3c5699a5 libct/int: remove unused code
Since commit 88e8350de2 the error message is different, so the check
is not working. In addition, for the cgroup v2 case, and it seems that
PID controller is always available these days.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-04-15 12:46:17 -07:00
Kir Kolyshkin
7b802a7da4 libct/int: better test container names
1. Do not create the same container named "test" over and over.

2. Fix randomization issues when generating container and cgroup names.
   The issues were:

    * math/rand used without seeding
    * complex rand/md5/hexencode sequence

   In both cases, replace with nanosecond time encoded with digits and
   lowercase letters.

3. Add test name to container and cgroup names. For example, this is
   how systemd log has changed:

   Before: Started libcontainer container test16ddfwutxgjte.
   After: Started libcontainer container TestPidsSystemd-4oaqvr.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-04-15 12:37:59 -07:00
Qiang Huang
2d38476c96 Merge pull request #2840 from kolyshkin/ignore-kmem
Ignore kernel memory settings
2021-04-13 09:44:14 +08:00
Kir Kolyshkin
52390d6804 Ignore kernel memory settings
This is somewhat radical approach to deal with kernel memory.

Per-cgroup kernel memory limiting was always problematic. A few
examples:

 - older kernels had bugs and were even oopsing sometimes (best example
   is RHEL7 kernel);
 - kernel is unable to reclaim the kernel memory so once the limit is
   hit a cgroup is toasted;
 - some kernel memory allocations don't allow failing.

In addition to that,

 - users don't have a clue about how to set kernel memory limits
   (as the concept is much more complicated than e.g. [user] memory);
 - different kernels might have different kernel memory usage,
   which is sort of unexpected;
 - cgroup v2 do not have a [dedicated] kmem limit knob, and thus
   runc silently ignores kernel memory limits for v2;
 - kernel v5.4 made cgroup v1 kmem.limit obsoleted (see
   https://github.com/torvalds/linux/commit/0158115f702b).

In view of all this, and as the runtime-spec lists memory.kernel
and memory.kernelTCP as OPTIONAL, let's ignore kernel memory
limits (for cgroup v1, same as we're already doing for v2).

This should result in less bugs and better user experience.

The only bad side effect from it might be that stat can show kernel
memory usage as 0 (since the accounting is not enabled).

[v2: add a warning in specconv that limits are ignored]

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-04-12 12:18:11 -07:00
Kir Kolyshkin
79a8647b81 libct/int: add TestFdLeaks
This is a very simple test that checks that container.Run do not leak
opened file descriptors.

In fact it does, so we have to add two exclusions:

1. /sys/fs/cgroup is opened once per lifetime in prepareOpenat2(),
    provided that cgroupv2 is used and openat2 is available. This
    works as intended ("it's not a bug, it's a feature").

2. ebpf program fd is leaked every time we call setDevices() for
   cgroupv2 (iow, every container.Run or container.Set leaks 1 fd).
   This needs to be fixed, thus FIXME.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-03-30 19:58:09 -07:00
Kir Kolyshkin
3de5c51454 libct/int: don't hardcode CAP_NET_ADMIN
... use the one from unix instead.

Coincidentally, this fixes this warning from gosimple linter:

> libcontainer/integration/exec_test.go:448:2: S1021: should merge variable declaration with assignment on next line (gosimple)
>	var netAdminBit uint
>	^

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-12-03 10:24:27 -08:00
Kir Kolyshkin
3387422bf9 libct/int: fix "simple" linter warnings
This fixes the following warnings:

> libcontainer/integration/exec_test.go:369:18: S1030: should use stdout.String() instead of string(stdout.Bytes()) (gosimple)
>	outputStatus := string(stdout.Bytes())
>	                ^
> libcontainer/integration/exec_test.go:422:18: S1030: should use stdout.String() instead of string(stdout.Bytes()) (gosimple)
>	outputStatus := string(stdout.Bytes())
>	                ^
> libcontainer/integration/exec_test.go:486:18: S1030: should use stdout.String() instead of string(stdout.Bytes()) (gosimple)
>	outputGroups := string(stdout.Bytes())
>	                ^
> libcontainer/integration/execin_test.go:191:18: S1030: should use stdout.String() instead of string(stdout.Bytes()) (gosimple)
>	outputGroups := string(stdout.Bytes())
>	                ^
> libcontainer/integration/execin_test.go:474:9: S1030: should use stdout.String() instead of string(stdout.Bytes()) (gosimple)
>	out := string(stdout.Bytes())
>	       ^

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-12-03 10:24:27 -08:00
Mauricio Vásquez
ac5ec5e32f libcontainer/integration: fix unit test
Fix a merge issue between 0aa0fae393 ("Kill all processes in cgroup even if init process Wait fails")
& 73d93eeb01 ("libct/int: make newTemplateConfig argument a struct") that
resulted in passing the wrong datatype to newTemplateConfig in
TestPIDHostInitProcessWait.

Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io>
2020-10-23 07:56:11 -05:00
Mrunal Patel
07e35a7a40 Merge pull request #2600 from kolyshkin/libct-int-wut
libcontainer/integration: fix cgroupv1 + systemd tests
2020-10-22 21:05:15 -07:00
Kir Kolyshkin
44f221e2fc Merge pull request #2633 from chaitanyabandi/2632-fix
Kill processes in cgroup even if process Wait fails
2020-10-08 10:51:13 -07:00
Chaitanya Bandi
0aa0fae393 Kill all processes in cgroup even if init process Wait fails
If the cgroup's init process doesn't complete successfully, Wait returns a
non-nil error. We should still kill all the process in the cgroup if process
namespace is shared. Otherwise, it may result in process leak.

Fixes #2632

Signed-off-by: Chaitanya Bandi <kbandi@cs.stonybrook.edu>
2020-10-08 01:26:34 +00:00
Amim Knabben
978fa6e906 Fixing some lint issues
Signed-off-by: Amim Knabben <amim.knabben@gmail.com>
2020-10-06 14:44:14 -04:00
Kir Kolyshkin
fa47f95872 libct/int/newTemplateConfig: add systemd support
... and properly set ScopePrefix, Name and Parent for a systemd case.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-10-05 20:51:02 -07:00
Kir Kolyshkin
9135d99c94 libct/int/newTemplateConfig: add userns param
It seems that a few tests add a cgroup mount in case userns is not set.
Let's do it inside newTemplateConfig() for all tests.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-10-05 20:51:02 -07:00
Kir Kolyshkin
73d93eeb01 libct/int: make newTemplateConfig argument a struct
...so we can add more fields later.

This commit is mostly courtesy of

sed -i 's/newTemplateConfig(rootfs)/newTemplateConfig(\&tParam{rootfs: rootfs})/g'

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-10-05 20:51:02 -07:00
Sebastiaan van Stijn
28b452bf65 libcontainer: unconvert
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-10-01 18:36:56 +02:00
Sebastiaan van Stijn
b3a8b0742c libcontainer: prefer bytes.TrimSpace() over strings.TrimSpace()
Perform trimming before converting to a string, which should be
somewhat more performant.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-10-01 18:36:53 +02:00
Sebastiaan van Stijn
8bf216728c use string-concatenation instead of sprintf for simple cases
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-09-30 10:51:59 +02:00
Kir Kolyshkin
b006f4a180 libct/cgroups: support Cgroups.Resources.Unified
Add support for unified resource map (as per [1]), and add some test
cases for the new functionality.

[1] https://github.com/opencontainers/runtime-spec/pull/1040

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-09-24 15:29:35 -07:00
Kir Kolyshkin
2de0b5aaf3 libct/integration: enable some tests for cgroupv2
The only two tests that are still skipped on v2 are kmem
and invalid CpuShares test -- since v2 does not support either.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-08-08 19:19:31 -07:00
Renaud Gaubert
2f7bdf9d3b Tests the new Hook
Signed-off-by: Renaud Gaubert <rgaubert@nvidia.com>
2020-06-19 02:39:20 +00:00
Mrunal Patel
33c6125da6 systemd: Export IsSystemdRunning() function
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2020-03-30 15:24:06 -07:00
Boris Popovschi
3b992087b8 Fix skip message for cgroupv2
Signed-off-by: Boris Popovschi <zyqsempai@mail.ru>
2020-02-03 14:27:12 +02:00
Akihiro Suda
ccd4436fc4 .travis.yml: add Fedora 31 vagrant box (for cgroup2)
As the baby step, only unit tests are executed.

Failing tests are currently skipped and will be fixed in follow-up PRs.

Fix #2124

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2019-10-31 16:53:01 +09:00
Aleksa Sarai
565325fc36 integration: fix mis-use of libcontainer.Factory
For some reason, libcontainer/integration has a whole bunch of incorrect
usages of libcontainer.Factory -- causing test failures with a set of
security patches that will be published soon. Fixing ths is fairly
trivial (switch to creating a new libcontainer.Factory once in each
process, rather than creating one in TestMain globally).

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2019-01-24 23:12:48 +13:00
W. Trevor King
e23868603a libcontainer: Set 'status' in hook stdin
Finish off the work started in a344b2d6 (sync up `HookState` with OCI
spec `State`, 2016-12-19, #1201).

And drop HookState, since there's no need for a local alias for
specs.State.

Also set c.initProcess in newInitProcess to support OCIState calls
from within initProcess.start().  I think the cyclic references
between linuxContainer and initProcess are unfortunate, but didn't
want to address that here.

I've also left the timing of the Prestart hooks alone, although the
spec calls for them to happen before start (not as part of creation)
[1,2].  Once the timing gets fixed we can drop the
initProcessStartTime hacks which initProcess.start currently needs.

I'm not sure why we trigger the prestart hooks in response to both
procReady and procHooks.  But we've had two prestart rounds in
initProcess.start since 2f276498 (Move pre-start hooks after container
mounts, 2016-02-17, #568).  I've left that alone too.

I really think we should have len() guards to avoid computing the
state when .Hooks is non-nil but the particular phase we're looking at
is empty.  Aleksa, however, is adamantly against them [3] citing a
risk of sloppy copy/pastes causing the hook slice being len-guarded to
diverge from the hook slice being iterated over within the guard.  I
think that ort of thing is very lo-risk, because:

* We shouldn't be copy/pasting this, right?  DRY for the win :).
* There's only ever a few lines between the guard and the guarded
  loop.  That makes broken copy/pastes easy to catch in review.
* We should have test coverage for these.  Guarding with the wrong
  slice is certainly not the only thing you can break with a sloppy
  copy/paste.

But I'm not a maintainer ;).

[1]: https://github.com/opencontainers/runtime-spec/blob/v1.0.0/config.md#prestart
[2]: https://github.com/opencontainers/runc/issues/1710
[3]: https://github.com/opencontainers/runc/pull/1741#discussion_r233331570

Signed-off-by: W. Trevor King <wking@tremily.us>
2018-11-14 06:49:49 -08:00
Mrunal Patel
4769cdf607 Merge pull request #1916 from crosbymichael/cgns
Add support for cgroup namespace
2018-11-13 12:21:38 -08:00
Yuanhong Peng
df3fa115f9 Add support for cgroup namespace
Cgroup namespace can be configured in `config.json` as other
namespaces. Here is an example:

```
"namespaces": [
	{
		"type": "pid"
	},
	{
		"type": "network"
	},
	{
		"type": "ipc"
	},
	{
		"type": "uts"
	},
	{
		"type": "mount"
	},
	{
		"type": "cgroup"
	}
],

```

Note that if you want to run a container which has shared cgroup ns with
another container, then it's strongly recommended that you set
proper `CgroupsPath` of both containers(the second container's cgroup
path must be the subdirectory of the first one). Or there might be
some unexpected results.

Signed-off-by: Yuanhong Peng <pengyuanhong@huawei.com>
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2018-10-31 10:51:43 -04:00
Dominik Süß
0b412e9482 various cleanups to address linter issues
Signed-off-by: Dominik Süß <dominik@suess.wtf>
2018-10-13 21:14:03 +02:00
Mrunal Patel
bd3c4f844a Fix race in runc exec
There is a race in runc exec when the init process stops just before
the check for the container status. It is then wrongly assumed that
we are trying to start an init process instead of an exec process.

This commit add an Init field to libcontainer Process to distinguish
between init and exec processes to prevent this race.

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2018-06-01 16:25:58 -07:00
Aleksa Sarai
fd3a6e6c83 libcontainer: handle unset oomScoreAdj corectly
Previously if oomScoreAdj was not set in config.json we would implicitly
set oom_score_adj to 0. This is not allowed according to the spec:

> If oomScoreAdj is not set, the runtime MUST NOT change the value of
> oom_score_adj.

Change this so that we do not modify oom_score_adj if oomScoreAdj is not
present in the configuration. While this modifies our internal
configuration types, the on-disk format is still compatible.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2018-03-17 13:53:42 +11:00
Tobias Klauser
24a4273cf9 libcontainer: handle error cases
Handle err return value of fmt.Scanf, os.Pipe and unix.ParseUnixRights.

Found with honnef.co/go/tools/cmd/staticcheck

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
2017-07-28 15:13:11 +02:00
Christy Perez
3d7cb4293c Move libcontainer to x/sys/unix
Since syscall is outdated and broken for some architectures,
use x/sys/unix instead.

There are still some dependencies on the syscall package that will
remain in syscall for the forseeable future:

Errno
Signal
SysProcAttr

Additionally:
- os still uses syscall, so it needs to be kept for anything
returning *os.ProcessState, such as process.Wait.

Signed-off-by: Christy Perez <christy@linux.vnet.ibm.com>
2017-05-22 17:35:20 -05:00
Daniel Dao
09c72cea69 fix panic regression when config doesnt have caps
When process config doesnt specify capabilities anywhere, we should not panic
because setting capabilities are optional.

Signed-off-by: Daniel Dao <dqminh89@gmail.com>
2017-03-21 00:45:26 +00:00
Mrunal Patel
4f9cb13b64 Update runtime spec to 1.0.0.rc5
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2017-03-15 11:38:37 -07:00
Aleksa Sarai
816efe0abd *: fix go-vet failures
Previously, we would get failures with go-vet with test files.

% go vet ./...
libcontainer/integration/exec_test.go:42: github.com/opencontainers/runc/libcontainer/configs.IDMap composite literal uses unkeyed fields
libcontainer/integration/exec_test.go:43: github.com/opencontainers/runc/libcontainer/configs.IDMap composite literal uses unkeyed fields
libcontainer/integration/exec_test.go:184: github.com/opencontainers/runc/libcontainer/configs.IDMap composite literal uses unkeyed fields
libcontainer/integration/exec_test.go:185: github.com/opencontainers/runc/libcontainer/configs.IDMap composite literal uses unkeyed fields
libcontainer/integration/exec_test.go:1568: github.com/opencontainers/runc/libcontainer/configs.IDMap composite literal uses unkeyed fields
libcontainer/integration/exec_test.go:1569: github.com/opencontainers/runc/libcontainer/configs.IDMap composite literal uses unkeyed fields
libcontainer/integration/exec_test.go:1600: github.com/opencontainers/runc/libcontainer/configs.IDMap composite literal uses unkeyed fields
libcontainer/integration/exec_test.go:1601: github.com/opencontainers/runc/libcontainer/configs.IDMap composite literal uses unkeyed fields
libcontainer/integration/execin_test.go:92: github.com/opencontainers/runc/libcontainer/configs.IDMap composite literal uses unkeyed fields
libcontainer/integration/execin_test.go:93: github.com/opencontainers/runc/libcontainer/configs.IDMap composite literal uses unkeyed fields
libcontainer/integration/execin_test.go:506: github.com/opencontainers/runc/libcontainer/configs.IDMap composite literal uses unkeyed fields
libcontainer/integration/execin_test.go:507: github.com/opencontainers/runc/libcontainer/configs.IDMap composite literal uses unkeyed fields

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-01-04 09:48:32 +11:00
Zhang Wei
a344b2d6a8 sync up HookState with OCI spec State
`HookState` struct should follow definition of `State` in runtime-spec:

* modify json name of `version` to `ociVersion`.
* Remove redundant `Rootfs` field as rootfs can be retrived from
`bundlePath/config.json`.

Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
2016-12-20 00:00:43 +08:00
Aleksa Sarai
bda3055055 *: update busybox test rootfs
Switch to the actual source of the official Docker library of images, so
that we have a proper source for the test filesystem. In addition,
update to the latest released version (1.25.0 [2016-06-23]) so that we
can use more up-to-date applets in our tests (such as stat(3)).

This patch is part of the console rewrite patchset.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-12-01 15:49:36 +11:00
Qiang Huang
b15668b36d Fix all typos found by misspell
I use the same tool (https://github.com/client9/misspell)
as Daniel used a few days ago, don't why he missed these
typos at that time.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-10-29 14:14:42 +08:00
Mrunal Patel
c4e7f01c4b Add an integration test for tmpfs copy up
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2016-10-04 11:26:37 -07:00
xiekeyang
2fcbb5a494 move util function
Signed-off-by: xiekeyang <xiekeyang@huawei.com>
2016-08-19 16:08:06 +08:00
Zhao Lei
f2c4c4ad35 integration_testing: Fix a output typo
s/destory/destroy for error message output.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
2016-07-20 11:17:13 +08:00
Petar Petrov
f9b72b1b46 Allow additional groups to be overridden in exec
Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com>
Signed-off-by: Petar Petrov <pppepito86@gmail.com>
Signed-off-by: Georgi Sabev <georgethebeatle@gmail.com>
2016-06-21 10:35:11 +03:00
Michael Crosby
1d61abea46 Allow delete of created container
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-06-02 12:26:12 -07:00
Michael Crosby
efcd73fb5b Fix signal handling for unit tests
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-05-31 11:10:47 -07:00
George Lestaris
f7ae27bfb7 HookState adhears to OCI
Signed-off-by: George Lestaris <glestaris@pivotal.io>
Signed-off-by: Ed King <eking@pivotal.io>
2016-04-06 16:57:59 +01:00
Julian Friedman
e91b2b8aca Set rlimits using prlimit in parent
Fixes #680

This changes setupRlimit to use the Prlimit syscall (rather than
Setrlimit) and moves the call to the parent process. This is necessary
because Setrlimit would affect the libcontainer consumer if called in
the parent, and would fail if called from the child if the
child process is in a user namespace and the requested rlimit is higher
than that in the parent.

Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com>
2016-03-25 15:11:44 +00:00
Jessica Frazelle
2c5b10189c remove deadcode
Signed-off-by: Jessica Frazelle <acidburn@docker.com>
2016-03-17 13:36:28 -07:00