109 Commits

Author SHA1 Message Date
Kir Kolyshkin
87b8f974c8 setupIO: close conn on error
While it does not make much sense practically, as runc is going to exit
soon and all fds will be closed anyway, various linters (including
SVACE) keep reporting this.

Let's make them happy.

Reported-by: Tigran Sogomonian <tsogomonian@astralinux.ru>
Reported-by: Mikhail Dmitrichenko <m.dmitrichenko222@gmail.com>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-07-29 14:05:40 -07:00
Kir Kolyshkin
7d2161f807 setupIO: simplify getting net.UnixConn
The typecast can't fail, so it doesn't make sense checking for errors
here.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-07-28 16:56:53 -07:00
Kir Kolyshkin
c5ab4b6e30 runc pause/unpause/ps: get rid of excessive warning
This issue was originally reported in podman PR 25792.

When calling runc pause/unpause for an ordinary user, podman do not
provide --systemd-cgroups option, and shouldUseRootlessCgroupManager
returns true. This results in a warning:

	$ podman pause sleeper
	WARN[0000] runc pause may fail if you don't have the full access to cgroups
	sleeper

Actually, it does not make sense to call shouldUseRootlessCgroupManager
at this point, because we already know if we're rootless or not, from
the container state.json (same for systemd).

Also, busctl binary is not available either in this context, so
shouldUseRootlessCgroupManager would not work properly.

Finally, it doesn't really matter if we use systemd or not, because we
use fs/fs2 manager to freeze/unfreeze, and it will return something like
EPERM (or tell that cgroups is not configured, for a true rootless
container).

So, let's only print the warning after pause/unpause failed,
if the error returned looks like a permission error.

Same applies to "runc ps".

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-04-08 14:00:56 -07:00
Kir Kolyshkin
10ca66bff5 runc exec: implement CPU affinity
As per
- https://github.com/opencontainers/runtime-spec/pull/1253
- https://github.com/opencontainers/runtime-spec/pull/1261

CPU affinity can be set in two ways:
1. When creating/starting a container, in config.json's
   Process.ExecCPUAffinity, which is when applied to all execs.
2. When running an exec, in process.json's CPUAffinity, which
   applied to a given exec and overrides the value from (1).

Add some basic tests.

Note that older kernels (RHEL8, Ubuntu 20.04) change CPU affinity of a
process to that of a container's cgroup, as soon as it is moved to that
cgroup, while newer kernels (Ubuntu 24.04, Fedora 41) don't do that.

Because of the above,
 - it's impossible to really test initial CPU affinity without adding
   debug logging to libcontainer/nsenter;
 - for older kernels, there can be a brief moment when exec's affinity
   is different than either initial or final affinity being set;
 - exec's final CPU affinity, if not specified, can be different
   depending on the kernel, therefore we don't test it.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-03-02 19:17:41 -08:00
lifubang
d92dd22611 performance improvement: setup signal notify in a new go routine
There is a big loop(at least 65 times) in `signal.Notify`, it costs as much
time as `runc init`, so we can call it in parallel ro reduce the container
start time. In a general test, it can be reduced about 38.70%.

Signed-off-by: lifubang <lifubang@acmcoder.com>
(cyphar: move signal channel definition inside goroutine)
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2025-03-01 23:46:25 +11:00
Kir Kolyshkin
4b87c7d4fd Fixups for newProcess
1. Pass an argument as a pointer rather than copying the whole structure.
   It was a pointer initially, but this has changed in commit b2d9d996
   without giving a reason why.

2. The newProcess description was added by commit 9fac18329 (yes, the
   very first one) and hasn't changed since. As of commit 29b139f7,
   the part of it which says "and stdio from the current process"
   is no longer valid.

   Remove it, and while at it, rewrite the description entirely.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-02-06 22:56:11 -08:00
Kir Kolyshkin
8fbdb7e78e setupIO: optimize
The rootuid and rootgid are only needed when detach and createTTY are
both false. We also call c.Config() twice, every time creating a copy
of struct Config.

Solve both issues by passing container pointer to setupIO, and get
rootuid/rootgid only when we need those.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-02-06 22:56:11 -08:00
Kir Kolyshkin
7dc2486889 libct: switch to numeric UID/GID/groups
This addresses the following TODO in the code (added back in 2015
by commit 845fc65e5):

> // TODO: fix libcontainer's API to better support uid/gid in a typesafe way.

Historically, libcontainer internally uses strings for user, group, and
additional (aka supplementary) groups.

Yet, runc receives those credentials as part of runtime-spec's process,
which uses integers for all of them (see [1], [2]).

What happens next is:

1. runc start/run/exec converts those credentials to strings (a User
   string containing "UID:GID", and a []string for additional GIDs) and
   passes those onto runc init.
2. runc init converts them back to int, in the most complicated way
   possible (parsing container's /etc/passwd and /etc/group).

All this conversion and, especially, parsing is totally unnecessary,
but is performed on every container exec (and start).

The only benefit of all this is, a libcontainer user could use user and
group names instead of numeric IDs (but runc itself is not using this
feature, and we don't know if there are any other users of this).

Let's remove this back and forth translation, hopefully increasing
runc exec performance.

The only remaining need to parse /etc/passwd is to set HOME environment
variable for a specified UID, in case $HOME is not explicitly set in
process.Env. This can now be done right in prepareEnv, which simplifies
the code flow a lot. Alas, we can not use standard os/user.LookupId, as
it could cache host's /etc/passwd or the current user (even with the
osusergo tag).

PS Note that the structures being changed (initConfig and Process) are
never saved to disk as JSON by runc, so there is no compatibility issue
for runc users.

Still, this is a breaking change in libcontainer, but we never promised
that libcontainer API will be stable (and there's a special package
that can handle it -- github.com/moby/sys/user). Reflect this in
CHANGELOG.

For 3998.

[1]: https://github.com/opencontainers/runtime-spec/blob/v1.0.2/config.md#posix-platform-user
[2]: https://github.com/opencontainers/runtime-spec/blob/v1.0.2/specs-go/config.go#L86

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-02-06 17:49:17 -08:00
Kir Kolyshkin
ec465d39f5 utils: simplify newProcess
This code is not in libcontainer, meaning it is only used by a short lived
binary (runc start/run/exec). Unlike code in libcontainer (see
CreateLibcontainerConfig), here we don't have to care about copying the
structures supplied as input, meaning we can just reuse the pointers
directly.

Fixes: bfbd0305 ("Add I/O priority")
Fixes: 770728e1 ("Support `process.scheduler`")
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2024-12-22 18:06:20 -08:00
Kir Kolyshkin
177c7d4f59 Fix codespell warnings
./features.go:30: tru ==> through, true
...
./utils_linux.go:147: infront ==> in front

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2024-05-23 22:02:45 -07:00
utam0k
bfbd0305ba Add I/O priority
Signed-off-by: utam0k <k0ma@utam0k.jp>
2024-03-30 22:31:54 +09:00
Aleksa Sarai
8e8b136c49 tree-wide: use /proc/thread-self for thread-local state
With the idmap work, we will have a tainted Go thread in our
thread-group that has a different mount namespace to the other threads.
It seems that (due to some bad luck) the Go scheduler tends to make this
thread the thread-group leader in our tests, which results in very
baffling failures where /proc/self/mountinfo produces gibberish results.

In order to avoid this, switch to using /proc/thread-self for everything
that is thread-local. This primarily includes switching all file
descriptor paths (CLONE_FS), all of the places that check the current
cgroup (technically we never will run a single runc thread in a separate
cgroup, but better to be safe than sorry), and the aforementioned
mountinfo code. We don't need to do anything for the following because
the results we need aren't thread-local:

 * Checks that certain namespaces are supported by stat(2)ing
   /proc/self/ns/...

 * /proc/self/exe and /proc/self/cmdline are not thread-local.

 * While threads can be in different cgroups, we do not do this for the
   runc binary (or libcontainer) and thus we do not need to switch to
   the thread-local version of /proc/self/cgroups.

 * All of the CLONE_NEWUSER files are not thread-local because you
   cannot set the usernamespace of a single thread (setns(CLONE_NEWUSER)
   is blocked for multi-threaded programs).

Note that we have to use runtime.LockOSThread when we have an open
handle to a tid-specific procfs file that we are operating on multiple
times. Go can reschedule us such that we are running on a different
thread and then kill the original thread (causing -ENOENT or similarly
confusing errors). This is not strictly necessary for most usages of
/proc/thread-self (such as using /proc/thread-self/fd/$n directly) since
only operating on the actual inodes associated with the tid requires
this locking, but because of the pre-3.17 fallback for CentOS, we have
to do this in most cases.

In addition, CentOS's kernel is too old for /proc/thread-self, which
requires us to emulate it -- however in rootfs_linux.go, we are in the
container pid namespace but /proc is the host's procfs. This leads to
the incredibly frustrating situation where there is no way (on pre-4.1
Linux) to figure out which /proc/self/task/... entry refers to the
current tid. We can just use /proc/self in this case.

Yes this is all pretty ugly. I also wish it wasn't necessary.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2023-12-14 11:36:41 +11:00
Kir Kolyshkin
7396ca90fa runc delete: do not ignore error from destroy
If container.Destroy() has failed, runc destroy still return 0, which is
wrong and can result in other issues down the line.

Let's always return error from destroy in runc delete.

For runc checkpoint and runc run, we still treat it as a warning.

Co-authored-by: Zhang Tianyang <burning9699@gmail.com>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2023-11-27 09:15:39 -08:00
lfbzhm
95a93c132c Merge pull request #4045 from fuweid/support-pidfd-socket
[feature request] *: introduce pidfd-socket flag
2023-11-22 09:13:55 +08:00
Wei Fu
94505a046a *: introduce pidfd-socket flag
The container manager like containerd-shim can't use cgroup.kill feature or
freeze all the processes in cgroup to terminate the exec init process.
It's unsafe to call kill(2) since the pid can be recycled. It's good to
provide the pidfd of init process through the pidfd-socket. It's similar to
the console-socket. With the pidfd, the container manager like containerd-shim
can send the signal to target process safely.

And for the standard init process, we can have polling support to get
exit event instead of blocking on wait4.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-11-21 18:28:50 +08:00
Aleksa Sarai
7c71a22705 rootfs: remove --no-mount-fallback and finally fix MS_REMOUNT
The original reasoning for this option was to avoid having mount options
be overwritten by runc. However, adding command-line arguments has
historically been a bad idea because it forces strict-runc-compatible
OCI runtimes to copy out-of-spec features directly from runc and these
flags are usually quite difficult to enable by users when using runc
through several layers of engines and orchestrators.

A far more preferable solution is to have a heuristic which detects
whether copying the original mount's mount options would override an
explicit mount option specified by the user. In this case, we should
return an error. You only end up in this path in the userns case, if you
have a bind-mount source with locked flags.

During the course of writing this patch, I discovered that several
aspects of our handling of flags for bind-mounts left much to be
desired. We have completely botched the handling of explicitly cleared
flags since commit 97f5ee4e6a ("Only remount if requested flags differ
from current"), with our behaviour only becoming increasingly more weird
with 50105de1d8 ("Fix failure with rw bind mount of a ro fuse") and
da780e4d27 ("Fix bind mounts of filesystems with certain options
set"). In short, we would only clear flags explicitly request by the
user purely by chance, in ways that it really should've been reported to
us by now. The most egregious is that mounts explicitly marked "rw" were
actually mounted "ro" if the bind-mount source was "ro" and no other
special flags were included. In addition, our handling of atime was
completely broken -- mostly due to how subtle the semantics of atime are
on Linux.

Unfortunately, while the runtime-spec requires us to implement
mount(8)'s behaviour, several aspects of the util-linux mount(8)'s
behaviour are broken and thus copying them makes little sense. Since the
runtime-spec behaviour for this case (should mount options for a "bind"
mount use the "mount --bind -o ..." or "mount --bind -o remount,..."
semantics? Is the fallback code we have for userns actually
spec-compliant?) and the mount(8) behaviour (see [1]) are not
well-defined, this commit simply fixes the most obvious aspects of the
behaviour that are broken while keeping the current spirit of the
implementation.

NOTE: The handling of atime in the base case is left for a future PR to
deal with. This means that the atime of the source mount will be
silently left alone unless the fallback path needs to be taken, and any
flags not explicitly set will be cleared in the base case. Whether we
should always be operating as "mount --bind -o remount,..." (where we
default to the original mount source flags) is a topic for a separate PR
and (probably) associated runtime-spec PR.

So, to resolve this:

* We store which flags were explicitly requested to be cleared by the
  user, so that we can detect whether the userns fallback path would end
  up setting a flag the user explicitly wished to clear. If so, we
  return an error because we couldn't fulfil the configuration settings.

* Revert 97f5ee4e6a ("Only remount if requested flags differ from
  current"), as missing flags do not mean we can skip MS_REMOUNT (in
  fact, missing flags are how you indicate a flag needs to be cleared
  with mount(2)). The original purpose of the patch was to fix the
  userns issue, but as mentioned above the correct mechanism is to do a
  fallback mount that copies the lockable flags from statfs(2).

* Improve handling of atime in the fallback case by:
    - Correctly handling the returned flags in statfs(2).
    - Implement the MNT_LOCK_ATIME checks in our code to ensure we
      produce errors rather than silently producing incorrect atime
      mounts.

* Improve the tests so we correctly detect all of these contingencies,
  including a general "bind-mount atime handling" test to ensure that
  the behaviour described here is accurate.

This change also inlines the remount() function -- it was only ever used
for the bind-mount remount case, and its behaviour is very bind-mount
specific.

[1]: https://github.com/util-linux/util-linux/issues/2433

Reverts: 97f5ee4e6a ("Only remount if requested flags differ from current")
Fixes: 50105de1d8 ("Fix failure with rw bind mount of a ro fuse")
Fixes: da780e4d27 ("Fix bind mounts of filesystems with certain options set")
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2023-10-24 17:28:25 +11:00
utam0k
770728e16e Support process.scheduler
Spec: https://github.com/opencontainers/runtime-spec/pull/1188
Fix: https://github.com/opencontainers/runc/issues/3895

Co-authored-by: lifubang <lifubang@acmcoder.com>
Signed-off-by: utam0k <k0ma@utam0k.jp>
Signed-off-by: lifubang <lifubang@acmcoder.com>
2023-10-04 15:53:18 +08:00
Ruediger Pluem
da780e4d27 Fix bind mounts of filesystems with certain options set
Currently bind mounts of filesystems with nodev, nosuid, noexec,
noatime, relatime, strictatime, nodiratime options set fail in rootless
mode if the same options are not set for the bind mount.
For ro filesystems this was resolved by #2570 by remounting again
with ro set.

Follow the same approach for nodev, nosuid, noexec, noatime, relatime,
strictatime, nodiratime but allow to revert back to the old behaviour
via the new `--no-mount-fallback` command line option.

Add a testcase to verify that bind mounts of filesystems with nodev,
nosuid, noexec, noatime options set work in rootless mode.
Add a testcase that mounts a nodev, nosuid, noexec, noatime filesystem
with a ro flag.
Add two further testcases that ensure that the above testcases would
fail if the `--no-mount-fallback` command line option is set.

* contrib/completions/bash/runc:
      Add `--no-mount-fallback` command line option for bash completion.

* create.go:
      Add `--no-mount-fallback` command line option.

* restore.go:
      Add `--no-mount-fallback` command line option.

* run.go:
      Add `--no-mount-fallback` command line option.

* libcontainer/configs/config.go:
      Add `NoMountFallback` field to the `Config` struct to store
      the command line option value.

* libcontainer/specconv/spec_linux.go:
      Add `NoMountFallback` field to the `CreateOpts` struct to store
      the command line option value and store it in the libcontainer
      config.

* utils_linux.go:
      Store the command line option value in the `CreateOpts` struct.

* libcontainer/rootfs_linux.go:
      In case that `--no-mount-fallback` is not set try to remount the
      bind filesystem again with the options nodev, nosuid, noexec,
      noatime, relatime, strictatime or nodiratime if they are set on
      the source filesystem.

* tests/integration/mounts_sshfs.bats:
      Add testcases and rework sshfs setup to allow specifying
      different mount options depending on the test case.

Signed-off-by: Ruediger Pluem <ruediger.pluem@vodafone.com>
2023-07-28 16:32:02 -07:00
Kir Kolyshkin
102b8abd26 libct: rm BaseContainer and Container interfaces
The only implementation of these is linuxContainer. It does not make
sense to have an interface with a single implementation, and we do not
foresee other types of containers being added to runc.

Remove BaseContainer and Container interfaces, moving their methods
documentation to linuxContainer.

Rename linuxContainer to Container.

Adopt users from using interface to using struct.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-03-23 11:04:12 -07:00
Kir Kolyshkin
6a3fe1618f libcontainer: remove LinuxFactory
Since LinuxFactory has become the means to specify containers state
top directory (aka --root), and is only used by two methods (Create
and Load), it is easier to pass root to them directly.

Modify all the users and the docs accordingly.

While at it, fix Create and Load docs (those that were originally moved
from the Factory interface docs).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-03-22 23:44:31 -07:00
Kir Kolyshkin
40b0088681 loadFactory: remove
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-02-18 16:05:29 -08:00
Kir Kolyshkin
36786c361a list, utils: remove redundant code
The value of root is already an absolute path since commit
ede8a86ec1, so it does not make sense to call filepath.Abs()
again.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-02-18 16:05:29 -08:00
Kir Kolyshkin
dbd990d555 libct: rm intelrtd.Manager interface, NewIntelRdtManager
Remove intelrtd.Manager interface, since we only have a single
implementation, and do not expect another one.

Rename intelRdtManager to Manager, and modify its users accordingly.

Remove NewIntelRdtManager from factory.

Remove IntelRdtfs. Instead, make intelrdt.NewManager return nil if the
feature is not available.

Remove TestFactoryNewIntelRdt as it is now identical to TestFactoryNew.

Add internal function newManager to be used for tests (to make sure
some testing is done even when the feature is not available in
kernel/hardware).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-02-03 17:33:03 -08:00
Kir Kolyshkin
39bd7b7217 libct: Container, Factory: rm newuidmap/newgidmap
These were introduced in commit d8b669400 back in 2017, with a TODO
of "make binary names configurable". Apparently, everyone is happy with
the hardcoded names. In fact, they *are* configurable (by prepending the
PATH with a directory containing own version of newuidmap/newgidmap).

Now, these binaries are only needed in a few specific cases (when
rootless is set etc.), so let's look them up only when needed.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-02-03 11:40:29 -08:00
Kir Kolyshkin
6e1d476aad runc: remove --criu option
This was introduced in an initial commit, back in the day when criu was
a highly experimental thing. Today it's not; most users who need it have
it packaged by their distro vendor.

The usual way to run a binary is to look it up in directories listed in
$PATH. This is flexible enough and allows for multiple scenarios (custom
binaries, extra binaries, etc.). This is the way criu should be run.

Make --criu a hidden option (thus removing it from help). Remove the
option from man pages, integration tests, etc. Remove all traces of
CriuPath from data structures.

Add a warning that --criu is ignored and will be removed.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-01-26 20:25:56 -08:00
Kir Kolyshkin
86733013cc notify_socket: setupSpec: drop ctx arg and return value
Those were never used (ctx was added by the initial commit, and
error was added by commit 25fd4a6757).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-11-29 20:10:22 -08:00
Kir Kolyshkin
3648346572 tty: ClosePostStart: rm return value
It is not and was not ever used.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-11-29 20:10:22 -08:00
Kir Kolyshkin
f3f4b6d155 tty: recvtty: rm process arg
It is not used since commit 00a0ecf554.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-11-29 20:10:22 -08:00
Kir Kolyshkin
e63186351b tty: rm inheritStdio return value
Since commit eebdb644f9 this function never returns any error.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-11-29 20:10:22 -08:00
Kir Kolyshkin
d23b810927 checkpoint: rm getDefaultImagePath arg
It was never used.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-11-29 20:10:22 -08:00
Kir Kolyshkin
0202c398ff runc exec: implement --cgroup
In some setups, multiple cgroups are used inside a container,
and sometime there is a need to execute a process in a particular
sub-cgroup (in case of cgroup v1, for a particular controller).
This is what this commit implements.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-09-27 10:25:42 -07:00
Kir Kolyshkin
097c6d7425 libct/cg: simplify getting cgroup manager
1. Make Rootless and Systemd flags part of config.Cgroups.

2. Make all cgroup managers (not just fs2) return error (so it can do
   more initialization -- added by the following commits).

3. Replace complicated cgroup manager instantiation in factory_linux
   by a single (and simple) libcontainer/cgroups/manager.New() function.

4. getUnifiedPath is simplified to check that only a single path is
   supplied (rather than checking that other paths, if supplied,
   are the same).

[v2: can't -> cannot]

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-09-23 09:11:44 -07:00
Kir Kolyshkin
9ba2f65d6b startContainer: minor refactor
All three callers* of startContainer call revisePidFile and createSpec
before calling it, so it makes sense to move those calls to inside of
the startContainer, and drop the spec argument.

* -- in fact restore does not call revisePidFile, but it should.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-09-14 10:53:11 -07:00
Kir Kolyshkin
6c4a3b13d1 runc init: pass _LIBCONTAINER_LOGLEVEL as int
Instead of passing _LIBCONTAINER_LOGLEVEL as a string
(like "debug" or "info"), use a numeric value.

Also, simplify the init log level passing code -- since we actually use
the same level as the runc binary, just get it from logrus.

This is a preparation for the next commit.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-09-09 14:57:20 -07:00
Kir Kolyshkin
0a3577c680 utils_linux: simplify newProcess
newProcess do not need those extra arguments, they can be handled
in the caller.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-09-09 14:57:20 -07:00
Akihiro Suda
5fb9b2a006 Merge pull request #3185 from kolyshkin/go117-build-tags
Add go:build tags
2021-09-02 13:35:33 +09:00
Kir Kolyshkin
9ff64c3d97 *: rm redundant linux build tag
For files that end with _linux.go or _linux_test.go, there is no need to
specify linux build tag, as it is assumed from the file name.

In addition, rename libcontainer/notify_linux_v2.go -> libcontainer/notify_v2_linux.go
for the file name to make sense.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-08-30 20:15:00 -07:00
lifubang
cb824629ba proposal: add --keep to runc run
Signed-off-by: lifubang <lifubang@acmcoder.com>
2021-08-02 12:51:36 -07:00
Kir Kolyshkin
a7cfb23b88 *: stop using pkg/errors
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-06-22 16:09:47 -07:00
Kir Kolyshkin
e6048715e4 Use gofumpt to format code
gofumpt (mvdan.cc/gofumpt) is a fork of gofmt with stricter rules.

Brought to you by

	git ls-files \*.go | grep -v ^vendor/ | xargs gofumpt -s -w

Looking at the diff, all these changes make sense.

Also, replace gofmt with gofumpt in golangci.yml.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-06-01 12:17:27 -07:00
Kir Kolyshkin
719d70d2e3 setupIO: simplify code
There's no need to check err for nil.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-01-07 13:33:41 -08:00
Xiaochen Shen
325a74ddec libcontainer/intelrdt: rm init() from intelrdt.go
Use sync.Once to init Intel RDT when needed for a small speedup to
operations which do not require Intel RDT.

Simplify IntelRdtManager initialization in LinuxFactory.

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
2020-12-16 23:37:31 +08:00
Xiaochen Shen
f62ad4a0de libcontainer/intelrdt: rename CAT and MBA enabled flags
Rename CAT and MBA enabled flags to be consistent with others.
No functional change.

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
2020-11-10 15:32:01 +08:00
Sebastiaan van Stijn
8bf216728c use string-concatenation instead of sprintf for simple cases
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-09-30 10:51:59 +02:00
Kir Kolyshkin
b79cb04886 runc run/exec: fix terminal wrt stdin redirection
This fixes the following failure:

> sudo runc run -b bundle ctr </dev/null
> WARN[0000] exit status 2
> ERRO[0000] container_linux.go:367: starting container process caused: process_linux.go:459: container init caused:

The "exit status 2" with no error message is caused by SIGHUP
which is sent to init by the kernel when we are losing the
controlling terminal. If we choose to ignore that, we'll get
panic in console.Current(), which is addressed by [1].

Otherwise, the issue here is simple: the code assumes stdin
is opened to a terminal, and fails to work otherwise. Some
standard Linux tools (e.g. stty, top) do the same (modulo panic),
while some others (reset, tput) use the trick of trying
all the three std streams (starting with stderr as it is least likely
to be redirected), and if all three fails, open /dev/tty.

This commit does a similar thing (see initHostConsole).

It also replaces the call to console.Current(), which may panic
(see [1]), by reusing the t.hostConsole.

Finally, a simple test case is added.

Fixes: https://github.com/opencontainers/runc/issues/2485

[1] https://github.com/containerd/console/pull/37

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-08-20 07:56:20 -07:00
zvier
92e2175de1 cleancode: clean code for utils_linux.go
Signed-off-by: Jeff Zvier <zvier20@gmail.com>
2020-07-23 06:12:27 +08:00
John Hwang
5aa0601a59 validateProcessSpec: prevent SEGV when config is valid json, but invalid.
Signed-off-by: John Hwang <John.F.Hwang@gmail.com>
2020-05-18 09:38:22 -07:00
John Hwang
7fc291fd45 Replace formatted errors when unneeded
Signed-off-by: John Hwang <John.F.Hwang@gmail.com>
2020-05-16 18:13:21 -07:00
Kir Kolyshkin
2b31437caa Merge pull request #2281 from AkihiroSuda/rootless-systemd
cgroup v2: support rootless systemd

LGTMs: kolyshkin, mrunalp
2020-05-07 21:45:52 -07:00
Akihiro Suda
bf15cc99b1 cgroup v2: support rootless systemd
Tested with both Podman (master) and Moby (master), on Ubuntu 19.10 .

$ podman --cgroup-manager=systemd run -it --rm --runtime=runc \
  --cgroupns=host --memory 42m --cpus 0.42 --pids-limit 42 alpine
/ # cat /proc/self/cgroup
0::/user.slice/user-1001.slice/user@1001.service/user.slice/libpod-132ff0d72245e6f13a3bbc6cdc5376886897b60ac59eaa8dea1df7ab959cbf1c.scope
/ # cat /sys/fs/cgroup/user.slice/user-1001.slice/user@1001.service/user.slice/libpod-132ff0d72245e6f13a3bbc6cdc5376886897b60ac59eaa8dea1df7ab959cbf1c.scope/memory.max
44040192
/ # cat /sys/fs/cgroup/user.slice/user-1001.slice/user@1001.service/user.slice/libpod-132ff0d72245e6f13a3bbc6cdc5376886897b60ac59eaa8dea1df7ab959cbf1c.scope/cpu.max
42000 100000
/ # cat /sys/fs/cgroup/user.slice/user-1001.slice/user@1001.service/user.slice/libpod-132ff0d72245e6f13a3bbc6cdc5376886897b60ac59eaa8dea1df7ab959cbf1c.scope/pids.max
42

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-05-08 12:39:20 +09:00