This simplifies the code flow and basically removes the last
filepath.Clean, which is not necessary in either case:
- for absolute path, single filepath.Clean is enough (as it is
guaranteed to remove all dot and dot-dot elements);
- for relative path, filepath.Rel calls Clean at the end
(which is even documented).
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Commit 770728e1 added Scheduler field into both Config and Process,
but forgot to add a mechanism to actually use Process.Scheduler.
As a result, runc exec does not set Process.Scheduler ever.
Fix it, and a test case (which fails before the fix).
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Commit bfbd0305b added IOPriority field into both Config and Process,
but forgot to add a mechanism to actually use Process.IOPriority.
As a result, runc exec does not set Process.IOPriority ever.
Fix it, and a test case (which fails before the fix).
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
For all other properties that are available in both Config and Process,
the merging is performed by newInitConfig.
Let's do the same for Capabilities for the sake of code uniformity.
Also, thanks to the previous commit, we no longer have to make sure we
do not call capabilities.New(nil).
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
In runtime-spec, capabilities property is optional, but
libcontainer/capabilities panics when New(nil) is called.
Because of this, there's a kludge in finalizeNamespace to ensure
capabilities.New is not called with nil argument, and there's a
TestProcessEmptyCaps to ensure runc won't panic.
Let's fix this at the source, allowing libct/cap to work with nil
capabilities.
(The caller is fixed by the next commit.)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
They are passed in initConfig twice, so it does not make sense.
NB: the alternative to that would be to remove Config field from
initConfig, but it results in a much bigger patch and more maintenance
down the road.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This is one of the dark corners of runc / libcontainer, so let's shed
some light on it.
initConfig is a structure which is filled in [mostly] by newInitConfig,
and one of its hidden aspects is it contains a process config which is
the result of merge between the container and the process configs.
Let's document how all this happens, where the fields are coming from,
which one has a preference, and how it all works.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Every time we call container.Config(), a new copy of
struct Config is created and returned, and we do it twice here.
Accessing container.config directly fixes this.
Fixes: 805b8c73d ("Do not create exec fifo in factory.Create")
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
1. Avoid splitting mount data into []string if it does not contain
options we're interested in. This should result in slightly less
garbage to collect.
2. Use if / else if instead of continue, to make it clearer that
we're processing one option at a time.
3. Print the whole option as a sting in an error message; practically
this should not have any effect, it's just simpler.
4. Improve some comments.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Using strings.CutPrefix (available since Go 1.20) instead of
strings.HasPrefix and/or strings.TrimPrefix makes the code
a tad more straightforward.
No functional change.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Using strings.HasPrefix with strings.TrimPrefix results in doing the
same thing (checking if prefix exists) twice. In this case, using
strings.TrimPrefix right away is sufficient.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
1. GetCgroupParamUint: drop strings.TrimSpace since it was already
done by GetCgroupParamString.
2. GetCgroupParamInt: use GetCgroupParamString, drop strings.TrimSpace.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
It makes sense to report an error if a key or a value is empty,
as we don't expect anything like this.
Reported-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Using strings.CutPrefix (added in Go 1.20, see [1]) results in faster and
cleaner code with less allocations (as the code only allocates memory
for the value, and does it once).
While at it, improve the function documentation.
[1]: https://github.com/golang/go/issues/42537
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Using strings.Cut (added in Go 1.18, see [1]) results in faster and
cleaner code with less allocations (as we're not using a slice).
Also, use switch in parseRdmaKV.
[1]: https://github.com/golang/go/issues/46336
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Using strings.Cut (added in Go 1.18, see [1]) results in faster and
cleaner code with less allocations (as we're not using a slice).
This code is tested by TestStatCPUPSI.
[1]: https://github.com/golang/go/issues/46336
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Using strings.Cut (added in Go 1.18, see [1]) results in faster and
cleaner code with less allocations (as we're not using a slice).
The code is tested by testCgroupResourcesUnified.
[1]: https://github.com/golang/go/issues/46336
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
For cgroup v2, we always expect /proc/$PID/cgroup contents like this:
> 0::/user.slice/user-1000.slice/user@1000.service/app.slice/vte-spawn-f71c3fb8-519d-4e2d-b13e-9252594b1e05.scope
So, it does not make sense to parse it using strings.Split, we can just
cut the prefix and return the rest.
Code tested by TestParseCgroupFromReader.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Remove extra global constants that are only used in a single place and
make it harder to read the code.
Rename nanosecondsInSecond -> nsInSec.
This code is tested by unit tests.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Using strings.Cut (added in Go 1.18, see [1]) results in faster and
cleaner code with less allocations (as we're not using a slice). This
also drops the check for extra dash (we're unlikely to get it from the
kernel anyway).
While at it, rename min/max -> from/to to avoid collision with Go
min/max builtins.
This code is tested by TestCPUSetStats* tests.
[1]: https://github.com/golang/go/issues/46336
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Since the UID/GID/AdditonalGroups fields are now numeric,
we can address the following TODO item in the code (added
by commit d2f49696 back in 2016):
> TODO: We currently can't do
> this check earlier, but if libcontainer.Process.User was typesafe
> this might work.
Move the check to much earlier phase, when we're preparing
to start a process in a container.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This addresses the following TODO in the code (added back in 2015
by commit 845fc65e5):
> // TODO: fix libcontainer's API to better support uid/gid in a typesafe way.
Historically, libcontainer internally uses strings for user, group, and
additional (aka supplementary) groups.
Yet, runc receives those credentials as part of runtime-spec's process,
which uses integers for all of them (see [1], [2]).
What happens next is:
1. runc start/run/exec converts those credentials to strings (a User
string containing "UID:GID", and a []string for additional GIDs) and
passes those onto runc init.
2. runc init converts them back to int, in the most complicated way
possible (parsing container's /etc/passwd and /etc/group).
All this conversion and, especially, parsing is totally unnecessary,
but is performed on every container exec (and start).
The only benefit of all this is, a libcontainer user could use user and
group names instead of numeric IDs (but runc itself is not using this
feature, and we don't know if there are any other users of this).
Let's remove this back and forth translation, hopefully increasing
runc exec performance.
The only remaining need to parse /etc/passwd is to set HOME environment
variable for a specified UID, in case $HOME is not explicitly set in
process.Env. This can now be done right in prepareEnv, which simplifies
the code flow a lot. Alas, we can not use standard os/user.LookupId, as
it could cache host's /etc/passwd or the current user (even with the
osusergo tag).
PS Note that the structures being changed (initConfig and Process) are
never saved to disk as JSON by runc, so there is no compatibility issue
for runc users.
Still, this is a breaking change in libcontainer, but we never promised
that libcontainer API will be stable (and there's a special package
that can handle it -- github.com/moby/sys/user). Reflect this in
CHANGELOG.
For 3998.
[1]: https://github.com/opencontainers/runtime-spec/blob/v1.0.2/config.md#posix-platform-user
[2]: https://github.com/opencontainers/runtime-spec/blob/v1.0.2/specs-go/config.go#L86
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This fixes k3s cross-compilation on Windows, broken by commit
1912d5988b ("*: actually support joining a userns with a new
container").
[@kolyshkin: commit message]
Fixes: 1912d5988b
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Similar to when SetAmbient() can fail, runc should be graceful about
ResetAmbient failing.
This functionality previously worked under gvisor, which doesn't
implement ambient capabilities atm. The hard error on reset broke gvisor
usage.
Signed-off-by: Evan Phoenix <evan@phx.io>
Use the old package name as an alias to minimize the patch.
No functional change; this just eliminates a bunch of deprecation
warnings.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Currently, libcontainer/devices contains two things:
1. Device-related configuration data structures and accompanying
methods. Those are used by runc itself, mostly by libct/cgroups.
2. A few functions (HostDevices, DeviceFromPath, GetDevices).
Those are not used by runc directly, but have some external users
(cri-o, microsoft/hcsshim), and they also have a few forks
(containerd/pkg/oci, podman/pkg/util).
This commit moves (1) to a new separate package, config (under
libcontainer/cgroups/devices), adding a backward-compatible aliases
(marked as deprecated so we will be able to remove those later).
Alas it's not possible to move this to libcontainer/cgroups directly
because some IDs (Type, Rule, Permissions) are too generic, and renaming
them (to DeviceType, DeviceRule, DevicePermissions) will break backward
compatibility (mostly due to Rule being embedded into Device).
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This release includes a minor breaking API change that requires us to
rework the types of our wrappers, but there is no practical behaviour
change.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
The previous logic caused runc to hang if CloseExecFrom returned an
error, as the defer waiting on logsDone never finished as the parent
process was never started (and it controls the closing of logsDone via
it's logsPipe).
This moves the defer to after we have started the parent, with means all
the logic related to managing the logsPipe should also be running.
Signed-off-by: Evan Phoenix <evan@phx.io>
The current implementation sets all the environment variables passed in
Process.Env in the current process, one by one, then uses os.Environ to
read those back.
As pointed out in [1], this is slow, as runc calls os.Setenv for every
variable, and there may be a few thousands of those. Looking into how
os.Setenv is implemented, it is indeed slow, especially when cgo is
enabled.
Looking into why it was implemented the way it is, I found commit
9744d72c and traced it to [2], which discusses the actual reasons.
It boils down to these two:
- HOME is not passed into container as it is set in setupUser by
os.Setenv and has no effect on config.Env;
- there is a need to deduplicate the environment variables.
Yet it was decided in [2] to not go ahead with this patch, but
later [3] was opened with the carry of this patch, and merged.
Now, from what I see:
1. Passing environment to exec is way faster than using os.Setenv and
os.Environ (tests show ~20x speed improvement in a simple Go test,
and ~3x improvement in real-world test, see below).
2. Setting environment variables in the runc context may result is some
ugly side effects (think GODEBUG, LD_PRELOAD, or _LIBCONTAINER_*).
3. Nothing in runtime spec says that the environment needs to be
deduplicated, or the order of preference (whether the first or the
last value of a variable with the same name is to be used). We should
stick to what we have in order to maintain backward compatibility.
So, this patch:
- switches to passing env directly to exec;
- adds deduplication mechanism to retain backward compatibility;
- takes care to set PATH from process.Env in the current process
(so that supplied PATH is used to find the binary to execute),
also to retain backward compatibility;
- adds HOME to process.Env if not set;
- ensures any StartContainer CommandHook entries with no environment
set explicitly are run with the same environment as before. Thanks
to @lifubang who noticed that peculiarity.
The benchmark added by the previous commit shows ~3x improvement:
│ before │ after │
│ sec/op │ sec/op vs base │
ExecInBigEnv-20 61.53m ± 1% 21.87m ± 16% -64.46% (p=0.000 n=10)
[1]: https://github.com/opencontainers/runc/pull/1983
[2]: https://github.com/docker-archive/libcontainer/pull/418
[3]: https://github.com/docker-archive/libcontainer/pull/432
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
1. Make CommandHook.Command a pointer, which reduces the amount of data
being copied when using hooks, and allows to modify command hooks.
2. Add SetDefaultEnv, which is to be used by the next commit.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This is a slight refactor of TestExecInEnvironment, making it more
strict wrt checking the exec output.
1. Explain why DEBUG is added twice to the env.
2. Reuse the execEnv for the check.
3. Make the check more strict -- instead of looking for substrings,
check line by line.
4. Add a check for extra environment variables.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Here's what it shows on my laptop (with -count 10 -benchtime 10s,
summarized by benchstat):
│ sec/op │
ExecTrue-20 8.477m ± 2%
ExecInBigEnv-20 61.53m ± 1%
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This helper was added for runc-dmz in commit dac417174, but runc-dmz was
later removed in commit 871057d, which forgot to remove the helper.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>