Commit Graph

126 Commits

Author SHA1 Message Date
Kir Kolyshkin
a56f85f87b libct/*: switch from configs to cgroups
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2024-12-11 19:08:40 -08:00
Kir Kolyshkin
4f3319b56d libct: decouple libct/cg/devices
Commit b6967fa84c moved the functionality of managing cgroup devices
into a separate package, and decoupled libcontainer/cgroups from it.

Yet, some software (e.g. cadvisor) may need to use libcontainer package,
which imports libcontainer/cgroups/devices, thus making it impossible to
use libcontainer without bringing in cgroup/devices dependency.

In fact, we only need to manage devices in runc binary, so move the
import to main.go.

The need to import libct/cg/dev in order to manage devices is already
documented in libcontainer/cgroups, but let's
 - update that documentation;
 - add a similar note to libcontainer/cgroups/systemd;
 - add a note to libct README.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2024-04-17 15:05:38 -07:00
Aleksa Sarai
ba0b5e2698 libcontainer: remove all mount logic from nsexec
With open_tree(OPEN_TREE_CLONE), it is possible to implement both the
id-mapped mounts and bind-mount source file descriptor logic entirely in
Go without requiring any complicated handling from nsexec.

However, implementing it the naive way (do the OPEN_TREE_CLONE in the
host namespace before the rootfs is set up -- which is what the existing
implementation did) exposes issues in how mount ordering (in particular
when handling mount sources from inside the container rootfs, but also
in relation to mount propagation) was handled for idmapped mounts and
bind-mount sources. In order to solve this problem completely, it is
necessary to spawn a thread which joins the container mount namespace
and provides mountfds when requested by the rootfs setup code (ensuring
that the mount order and mount propagation of the source of the
bind-mount are handled correctly). While the need to join the mount
namespace leads to other complicated (such as with the usage of
/proc/self -- fixed in a later patch) the resulting code is still
reasonable and is the only real way to solve the issue.

This allows us to reduce the amount of C code we have in nsexec, as well
as simplifying a whole host of places that were made more complicated
with the addition of id-mapped mounts and the bind sourcefd logic.
Because we join the container namespace, we can continue to use regular
O_PATH file descriptors for non-id-mapped bind-mount sources (which
means we don't have to raise the kernel requirement for that case).

In addition, we can easily add support for id-mappings that don't match
the container's user namespace. The approach taken here is to use Go's
officially supported mechanism for spawning a process in a user
namespace, but (ab)use PTRACE_TRACEME to avoid actually having to exec a
different process. The most efficient way to implement this would be to
do clone() in cgo directly to run a function that just does
kill(getpid(), SIGSTOP) -- we can always switch to that if it turns out
this approach is too slow. It should be noted that the included
micro-benchmark seems to indicate this is Fast Enough(TM):

  goos: linux
  goarch: amd64
  pkg: github.com/opencontainers/runc/libcontainer/userns
  cpu: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz
  BenchmarkSpawnProc
  BenchmarkSpawnProc-8        1670            770065 ns/op

Fixes: fda12ab101 ("Support idmap mounts on volumes")
Fixes: 9c444070ec ("Open bind mount sources from the host userns")
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2023-12-14 11:36:40 +11:00
Kir Kolyshkin
efbebb39b5 libct: rename root to stateDir in struct Container
The name "root" (or "containerRoot") is confusing; one might think it is
the root of container's file system (the directory we chroot into).

Rename to stateDir for clarity.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2023-10-04 14:57:10 +11:00
Rodrigo Campos
0172016a53 libcontainer: Add generic parseFdsFromEnv()
We will add code that uses this function in future patches. So let's
just split it to a new function now and reuse it later.

Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
2023-07-11 16:17:48 +02:00
Kir Kolyshkin
72657eac2e libct: move StartInitialization
No code change, just moving a function from factory_linux.go to
init_linux.go.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2023-05-17 12:46:27 -07:00
Kir Kolyshkin
82bc89cd10 runc run: refuse a non-empty cgroup
Commit d08bc0c1b3 ("runc run: warn on non-empty cgroup") introduced
a warning when a container is started in a non-empty cgroup. Such
configuration has lots of issues.

In addition to that, such configuration is not possible at all when
using the systemd cgroup driver.

As planned, let's promote this warning to an error, and fix the test
case accordingly.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2023-03-30 19:55:39 -07:00
Kir Kolyshkin
b44da4c05f libct: validateID: stop using regexp
Replace a regex with a simple for loop. Document the function.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-12-15 18:50:54 -08:00
Kir Kolyshkin
6662570110 libct: fix staticcheck warning
A new version of staticcheck (included into golangci-lint 1.46.2) gives
this new warning:

> libcontainer/factory_linux.go:230:59: SA9008: e refers to the result of a failed type assertion and is a zero value, not the value that was being type-asserted (staticcheck)
> 				err = fmt.Errorf("panic from initialization: %v, %s", e, debug.Stack())
> 				                                                      ^
> libcontainer/factory_linux.go:226:7: SA9008(related information): this is the variable being read (staticcheck)
> 			if e, ok := e.(error); ok {
> 			   ^

Apparently, this is indeed a bug. Fix by using a different name for a
new variable, so we can access the old one under "else".

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-06-16 16:31:42 -07:00
Kir Kolyshkin
b6967fa84c Decouple cgroup devices handling
This commit separates the functionality of setting cgroup device
rules out of libct/cgroups to libct/cgroups/devices package. This
package, if imported, sets the function variables in libct/cgroups and
libct/cgroups/systemd, so that a cgroup manager can use those to manage
devices. If those function variables are nil (when libct/cgroups/devices
are not imported), a cgroup manager returns the ErrDevicesUnsupported
in case any device rules are set in Resources.

It also consolidates the code from libct/cgroups/ebpf and
libct/cgroups/ebpf/devicefilter into libct/cgroups/devices.

Moved some tests in libct/cg/sd that require device management to
libct/sd/devices.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-05-18 11:17:08 -07:00
Kir Kolyshkin
102b8abd26 libct: rm BaseContainer and Container interfaces
The only implementation of these is linuxContainer. It does not make
sense to have an interface with a single implementation, and we do not
foresee other types of containers being added to runc.

Remove BaseContainer and Container interfaces, moving their methods
documentation to linuxContainer.

Rename linuxContainer to Container.

Adopt users from using interface to using struct.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-03-23 11:04:12 -07:00
Kir Kolyshkin
6a3fe1618f libcontainer: remove LinuxFactory
Since LinuxFactory has become the means to specify containers state
top directory (aka --root), and is only used by two methods (Create
and Load), it is easier to pass root to them directly.

Modify all the users and the docs accordingly.

While at it, fix Create and Load docs (those that were originally moved
from the Factory interface docs).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-03-22 23:44:31 -07:00
Kir Kolyshkin
6a29787bc9 libct/factory: make some methods functions
These do not have to be methods.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-03-22 23:29:24 -07:00
Kir Kolyshkin
8358a0ecbb libct: StartInitialization: decouple from factory
StartInitialization does not have to be a method of Factory (while
it is clear why it was done that way initially, now we only have
Linux containers so it does not make sense).

Fix callers and docs accordingly.

No change in functionality.

Also, since this was the only user of libcontainer.New with the empty
string as an argument, the corresponding check can now be removed
from it.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-03-22 23:29:24 -07:00
Kir Kolyshkin
a78c9a0184 libct: remove Factory interface
The only implementation is LinuxFactory, let's use this directly.

Move the piece of documentation about Create from removed factory.go to
the factory_linux.go.

The LinuxFactory is to be removed later.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-03-22 23:29:24 -07:00
Kir Kolyshkin
71bc308b7f libct/New: remove options argument
It is not used by anyone.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-03-22 23:29:24 -07:00
Kir Kolyshkin
b6514469a8 libct: remove TmpfsRoot
Is is not used by anyone

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-03-22 23:29:24 -07:00
Kir Kolyshkin
dbd990d555 libct: rm intelrtd.Manager interface, NewIntelRdtManager
Remove intelrtd.Manager interface, since we only have a single
implementation, and do not expect another one.

Rename intelRdtManager to Manager, and modify its users accordingly.

Remove NewIntelRdtManager from factory.

Remove IntelRdtfs. Instead, make intelrdt.NewManager return nil if the
feature is not available.

Remove TestFactoryNewIntelRdt as it is now identical to TestFactoryNew.

Add internal function newManager to be used for tests (to make sure
some testing is done even when the feature is not available in
kernel/hardware).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-02-03 17:33:03 -08:00
Kir Kolyshkin
39bd7b7217 libct: Container, Factory: rm newuidmap/newgidmap
These were introduced in commit d8b669400 back in 2017, with a TODO
of "make binary names configurable". Apparently, everyone is happy with
the hardcoded names. In fact, they *are* configurable (by prepending the
PATH with a directory containing own version of newuidmap/newgidmap).

Now, these binaries are only needed in a few specific cases (when
rootless is set etc.), so let's look them up only when needed.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-02-03 11:40:29 -08:00
Kir Kolyshkin
0d21515038 libct: remove Validator interface
We only have one implementation of config validator, which is always
used. It makes no sense to have Validator interface.

Having validate.Validator field in Factory does not make sense for all
the same reasons.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-02-03 11:40:29 -08:00
Kir Kolyshkin
630c0d7e8c libct: Container, Factory: rm InitPath, InitArgs
Those are *always* /proc/self/exe init, and it does not make sense
to ever change these. More to say, if InitArgs option func (removed
by this commit) is used to change these parameters, it will break
things, since "init" is hardcoded elsewhere.

Remove this.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-02-03 11:40:29 -08:00
Akihiro Suda
e9190d3ae1 Merge pull request #3353 from kolyshkin/rm-criu-opt
runc: remove --criu option
2022-02-01 08:28:14 +09:00
Kir Kolyshkin
6e1d476aad runc: remove --criu option
This was introduced in an initial commit, back in the day when criu was
a highly experimental thing. Today it's not; most users who need it have
it packaged by their distro vendor.

The usual way to run a binary is to look it up in directories listed in
$PATH. This is flexible enough and allows for multiple scenarios (custom
binaries, extra binaries, etc.). This is the way criu should be run.

Make --criu a hidden option (thus removing it from help). Remove the
option from man pages, integration tests, etc. Remove all traces of
CriuPath from data structures.

Add a warning that --criu is ignored and will be removed.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-01-26 20:25:56 -08:00
Kir Kolyshkin
485e6c84e7 Fix some revive warnings
This is needed since the future commits will touch this code, and then
the lint-extra CI job complains.

> libcontainer/factory.go#L245
> var-naming: var fdsJson should be fdsJSON (revive)

and

> libcontainer/init_linux.go#L181
> error-strings: error strings should not be capitalized or end with punctuation or a newline (revive)

and

> notify_socket.go#L94
> receiver-naming: receiver name n should be consistent with previous receiver name s for notifySocket (revive)

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-01-26 19:14:14 -08:00
Mrunal Patel
403cda19e4 Merge pull request #3326 from kolyshkin/go118beta
ci: add go 1.18beta1
2022-01-25 16:01:25 -08:00
Kir Kolyshkin
907aefd43c libct: StartInitialization: fix %w related warning
(on Go 1.18 this is actually an error)

> libcontainer/factory_linux.go:341:10: fmt.Errorf format %w has arg e of wrong type interface{}

Unfortunately, fixing it results in an errorlint warning:

> libcontainer/factory_linux.go#L344 non-wrapping format verb for fmt.Errorf. Use `%w` to format errors (errorlint)

so we have to silence that one.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-12-15 17:20:09 -08:00
Kir Kolyshkin
024adbb1b9 libct: Create: rm unneeded chown
Commit 7cfb107f2c started using unix.Geteuid(), unix.Getegid() as
uid/gid argument for chown. It seems that it should have removed chown
entirely, since, according to mkdir(2),

> The newly created directory will be owned by the effective user ID of
> the process. If the directory containing the file has the
> set-group-ID bit set, or if the filesystem is mounted with BSD
> group semantics (mount -o bsdgroups or, synonymously mount -o grpid),
> the new directory will inherit the group ownership from its parent;
> otherwise it will be > owned by the effective group ID of the process.

So, the only effect of the chown after mkdir is ignoring the sgid bit
on the parent directory (which is probably not the right thing to do).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-12-10 18:49:25 -08:00
Kir Kolyshkin
b357bc1349 libct/factory: rm id param from loadState
It is not used since commit e918d0213.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-11-29 20:10:22 -08:00
Kir Kolyshkin
68c2b6a7d9 runc run: refuse a frozen cgroup
Sometimes a container cgroup already exists but is frozen.
When this happens, runc init hangs, and it's not clear what is going on.

Refuse to run in a frozen cgroup; add a test case.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-11-03 20:36:27 -07:00
Kir Kolyshkin
d08bc0c1b3 runc run: warn on non-empty cgroup
Currently runc allows multiple containers to share the same cgroup (for
example, by having the same cgroupPath in config.json). While such
shared configuration might be OK, there are some issues:

 - When each container has its own resource limits, the order of
   containers start determines whose limits will be effectively applied.

 - When one of containers is paused, all others are paused, too.

 - When a container is paused, any attempt to do runc create/run/exec
   end up with runc init stuck inside a frozen cgroup.

 - When a systemd cgroup manager is used, this becomes even worse -- such
   as, stop (or even failed start) of any container results in
   "stopTransientUnit" command being sent to systemd, and so (depending on
   unit properties) other containers can receive SIGTERM, be killed after a
   timeout etc.

Any of the above may lead to various hard-to-debug situations in production
(runc init stuck, cgroup removal error, wrong resource limits, init not
reaping zombies etc.).

One obvious solution is to refuse a non-empty cgroup when starting a new
container. This would be a breaking change though, so let's make it in
steps, with the first step is issue a warning and a deprecated notice
about a non-empty cgroup.

Later (in runc 1.2) we will replace this warning with an error.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-11-03 20:36:27 -07:00
Alban Crequy
9c444070ec Open bind mount sources from the host userns
The source of the bind mount might not be accessible in a different user
namespace because a component of the source path might not be traversed
under the users and groups mapped inside the user namespace. This caused
errors such as the following:

  # time="2020-06-22T13:48:26Z" level=error msg="container_linux.go:367:
  starting container process caused: process_linux.go:459:
  container init caused: rootfs_linux.go:58:
  mounting \"/tmp/busyboxtest/source-inaccessible/dir\"
  to rootfs at \"/tmp/inaccessible\" caused:
  stat /tmp/busyboxtest/source-inaccessible/dir: permission denied"

To solve this problem, this patch performs the following:

1. in nsexec.c, it opens the source path in the host userns (so we have
   the right permissions to open it) but in the container mntns (so the
   kernel cross mntns mount check let us mount it later:
   https://github.com/torvalds/linux/blob/v5.8/fs/namespace.c#L2312).

2. in nsexec.c, it passes the file descriptors of the source to the
   child process with SCM_RIGHTS.

3. In runc-init in Golang, it finishes the mounts while inside the
   userns even without access to the some components of the source
   paths.

Passing the fds with SCM_RIGHTS is necessary because once the child
process is in the container mntns, it is already in the container userns
so it cannot temporarily join the host mntns.

This patch uses the existing mechanism with _LIBCONTAINER_* environment
variables to pass the file descriptors from runc to runc init.

This patch uses the existing mechanism with the Netlink-style bootstrap
to pass information about the list of source mounts to nsexec.c.

Rootless containers don't use this bind mount sources fdpassing
mechanism because we can't setns() to the target mntns in a rootless
container (we don't have the privileges when we are in the host userns).

This patch takes care of using O_CLOEXEC on mount fds, and close them
early.

Fixes: #2484.

Signed-off-by: Alban Crequy <alban@kinvolk.io>
Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
Co-authored-by: Rodrigo Campos <rodrigo@kinvolk.io>
2021-10-12 15:13:45 +02:00
Kir Kolyshkin
097c6d7425 libct/cg: simplify getting cgroup manager
1. Make Rootless and Systemd flags part of config.Cgroups.

2. Make all cgroup managers (not just fs2) return error (so it can do
   more initialization -- added by the following commits).

3. Replace complicated cgroup manager instantiation in factory_linux
   by a single (and simple) libcontainer/cgroups/manager.New() function.

4. getUnifiedPath is simplified to check that only a single path is
   supplied (rather than checking that other paths, if supplied,
   are the same).

[v2: can't -> cannot]

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-09-23 09:11:44 -07:00
Alban Crequy
2b025c0173 Implement Seccomp Notify
This commit implements support for the SCMP_ACT_NOTIFY action. It
requires libseccomp-2.5.0 to work but runc still works with older
libseccomp if the seccomp policy does not use the SCMP_ACT_NOTIFY
action.

A new synchronization step between runc[INIT] and runc run is introduced
to pass the seccomp fd. runc run fetches the seccomp fd with pidfd_get
from the runc[INIT] process and sends it to the seccomp agent using
SCM_RIGHTS.

As suggested by @kolyshkin, we also make writeSync() a wrapper of
writeSyncWithFd() and wrap the error there. To avoid pointless errors,
we made some existing code paths just return the error instead of
re-wrapping it. If we don't do it, error will look like:

	writing syncT <act>: writing syncT: <err>

By adjusting the code path, now they just look like this
	writing syncT <act>: <err>

Signed-off-by: Alban Crequy <alban@kinvolk.io>
Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
Co-authored-by: Rodrigo Campos <rodrigo@kinvolk.io>
2021-09-07 13:04:24 +02:00
xiadanni
64358c4de9 optimize log: move WriteJSON defer as early as possible
if function returns error before WriteJSON defer, error will not be
printed out, so move this defer as early as possible and use logrus to
print out error if returns before it.

Signed-off-by: xiadanni <xiadanni1@huawei.com>
2021-09-07 07:00:57 +08:00
Akihiro Suda
bd75bc2dc6 Merge pull request #3176 from kolyshkin/rm-config-error-alt
libct/error.go: rm ConfigError (alt)
2021-09-02 14:34:32 +09:00
Kir Kolyshkin
9ff64c3d97 *: rm redundant linux build tag
For files that end with _linux.go or _linux_test.go, there is no need to
specify linux build tag, as it is assumed from the file name.

In addition, rename libcontainer/notify_linux_v2.go -> libcontainer/notify_v2_linux.go
for the file name to make sense.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-08-30 20:15:00 -07:00
Kir Kolyshkin
538ba846dd libct/error.go: rm ConfigError
ConfigError was added by commit e918d02139, while removing runc own
error system, to preserve a way for a libcontainer user to distinguish
between a configuration error and something else.

The way ConfigError is implemented requires a different type of check
(compared to all other errors defined by error.go). An attempt was made
to rectify this, but the resulting code became even more complicated.

As no one is using this functionality (of differentiating a "bad config"
type of error from other errors), let's just drop the ConfigError type.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-08-23 18:56:08 -07:00
Kir Kolyshkin
e918d02139 libcontainer: rm own error system
This removes libcontainer's own error wrapping system, consisting of a
few types and functions, aimed at typization, wrapping and unwrapping
of errors, as well as saving error stack traces.

Since Go 1.13 now provides its own error wrapping mechanism and a few
related functions, it makes sense to switch to it.

While doing that, improve some error messages so that they start
with "error", "unable to", or "can't".

A few things that are worth mentioning:

1. We lose stack traces (which were never shown anyway).

2. Users of libcontainer that relied on particular errors (like
   ContainerNotExists) need to switch to using errors.Is with
   the new errors defined in error.go.

3. encoding/json is unable to unmarshal the built-in error type,
   so we have to introduce initError and wrap the errors into it
   (basically passing the error as a string). This is the same
   as it was before, just a tad simpler (actually the initError
   is a type that got removed in commit afa844311; also suddenly
   ierr variable name makes sense now).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-06-24 10:21:04 -07:00
Kir Kolyshkin
a7cfb23b88 *: stop using pkg/errors
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-06-22 16:09:47 -07:00
Kir Kolyshkin
7be93a66b9 *: fmt.Errorf: use %w when appropriate
This should result in no change when the error is printed, but make the
errors returned unwrappable, meaning errors.As and errors.Is will work.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-06-22 16:09:47 -07:00
Kir Kolyshkin
36aefad45d libct: wrap unix.Mount/Unmount errors
Errors returned by unix are bare. In some cases it's impossible to find
out what went wrong because there's is not enough context.

Add a mountError type (mostly copy-pasted from github.com/moby/sys/mount),
and mount/unmount helpers. Use these where appropriate, and convert error
checks to use errors.Is.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-06-22 16:09:37 -07:00
Kir Kolyshkin
563225d55f libct/StartInitialization: fix errors
Errors from strconv.Atoi are already descriptive enough, and contain the
value being converted, so our error messages do not need to contain it.

While at it, use %w to wrap errors.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-06-22 11:43:44 -07:00
Kir Kolyshkin
627a06ad92 Replace fmt.Errorf w/o %-style to errors.New
Using fmt.Errorf for errors that do not have %-style formatting
directives is an overkill. Switch to errors.New.

Found by

	git grep fmt.Errorf | grep -v ^vendor | grep -v '%'

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-06-22 11:42:07 -07:00
Kir Kolyshkin
e6048715e4 Use gofumpt to format code
gofumpt (mvdan.cc/gofumpt) is a fork of gofmt with stricter rules.

Brought to you by

	git ls-files \*.go | grep -v ^vendor/ | xargs gofumpt -s -w

Looking at the diff, all these changes make sense.

Also, replace gofmt with gofumpt in golangci.yml.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-06-01 12:17:27 -07:00
Sebastiaan van Stijn
b45fbd43b8 errcheck: libcontainer
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-05-20 14:19:26 +02:00
Kir Kolyshkin
201d60c51d runc run/start/exec: fix init log forwarding race
Sometimes debug.bats test cases are failing like this:

> not ok 27 global --debug to --log --log-format 'json'
> # (in test file tests/integration/debug.bats, line 77)
> #   `[[ "${output}" == *"child process in init()"* ]]' failed

It happens more when writing to disk.

This issue is caused by the fact that runc spawns log forwarding goroutine
(ForwardLogs) but does not wait for it to finish, resulting in missing
debug lines from nsexec.

ForwardLogs itself, though, never finishes, because it reads from a
reading side of a pipe which writing side is not closed. This is
especially true in case of runc create, which spawns runc init and
exits; meanwhile runc init waits on exec fifo for arbitrarily long
time before doing execve.

So, to fix the failure described above, we need to:

 1. Make runc create/run/exec wait for ForwardLogs to finish;

 2. Make runc init close its log pipe file descriptor (i.e.
    the one which value is passed in _LIBCONTAINER_LOGPIPE
    environment variable).

This is exactly what this commit does:

 1. Amend ForwardLogs to return a channel, and wait for it in start().

 2. In runc init, save the log fd and close it as late as possible.

PS I have to admit I still do not understand why an explicit close of
log pipe fd is required in e.g. (*linuxSetnsInit).Init, right before
the execve which (thanks to CLOEXEC) closes the fd anyway.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-03-25 19:18:55 -07:00
Kenta Tada
8a3484b736 libcontainer/factory*: adjust the file mode
This commit adjusts the file mode to use the latest golang style
Related to #2625

Signed-off-by: Kenta Tada <Kenta.Tada@sony.com>
2021-03-02 16:26:36 +09:00
Iceber Gu
916654fffa libcontainer: fix LinuxFactory comments
Signed-off-by: Iceber Gu <wei.cai-nat@daocloud.io>
2021-02-24 15:51:06 +08:00
Xiaochen Shen
325a74ddec libcontainer/intelrdt: rm init() from intelrdt.go
Use sync.Once to init Intel RDT when needed for a small speedup to
operations which do not require Intel RDT.

Simplify IntelRdtManager initialization in LinuxFactory.

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
2020-12-16 23:37:31 +08:00
Xiaochen Shen
f62ad4a0de libcontainer/intelrdt: rename CAT and MBA enabled flags
Rename CAT and MBA enabled flags to be consistent with others.
No functional change.

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
2020-11-10 15:32:01 +08:00