This addresses the following TODO in the code (added back in 2015
by commit 845fc65e5):
> // TODO: fix libcontainer's API to better support uid/gid in a typesafe way.
Historically, libcontainer internally uses strings for user, group, and
additional (aka supplementary) groups.
Yet, runc receives those credentials as part of runtime-spec's process,
which uses integers for all of them (see [1], [2]).
What happens next is:
1. runc start/run/exec converts those credentials to strings (a User
string containing "UID:GID", and a []string for additional GIDs) and
passes those onto runc init.
2. runc init converts them back to int, in the most complicated way
possible (parsing container's /etc/passwd and /etc/group).
All this conversion and, especially, parsing is totally unnecessary,
but is performed on every container exec (and start).
The only benefit of all this is, a libcontainer user could use user and
group names instead of numeric IDs (but runc itself is not using this
feature, and we don't know if there are any other users of this).
Let's remove this back and forth translation, hopefully increasing
runc exec performance.
The only remaining need to parse /etc/passwd is to set HOME environment
variable for a specified UID, in case $HOME is not explicitly set in
process.Env. This can now be done right in prepareEnv, which simplifies
the code flow a lot. Alas, we can not use standard os/user.LookupId, as
it could cache host's /etc/passwd or the current user (even with the
osusergo tag).
PS Note that the structures being changed (initConfig and Process) are
never saved to disk as JSON by runc, so there is no compatibility issue
for runc users.
Still, this is a breaking change in libcontainer, but we never promised
that libcontainer API will be stable (and there's a special package
that can handle it -- github.com/moby/sys/user). Reflect this in
CHANGELOG.
For 3998.
[1]: https://github.com/opencontainers/runtime-spec/blob/v1.0.2/config.md#posix-platform-user
[2]: https://github.com/opencontainers/runtime-spec/blob/v1.0.2/specs-go/config.go#L86
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
The current implementation sets all the environment variables passed in
Process.Env in the current process, one by one, then uses os.Environ to
read those back.
As pointed out in [1], this is slow, as runc calls os.Setenv for every
variable, and there may be a few thousands of those. Looking into how
os.Setenv is implemented, it is indeed slow, especially when cgo is
enabled.
Looking into why it was implemented the way it is, I found commit
9744d72c and traced it to [2], which discusses the actual reasons.
It boils down to these two:
- HOME is not passed into container as it is set in setupUser by
os.Setenv and has no effect on config.Env;
- there is a need to deduplicate the environment variables.
Yet it was decided in [2] to not go ahead with this patch, but
later [3] was opened with the carry of this patch, and merged.
Now, from what I see:
1. Passing environment to exec is way faster than using os.Setenv and
os.Environ (tests show ~20x speed improvement in a simple Go test,
and ~3x improvement in real-world test, see below).
2. Setting environment variables in the runc context may result is some
ugly side effects (think GODEBUG, LD_PRELOAD, or _LIBCONTAINER_*).
3. Nothing in runtime spec says that the environment needs to be
deduplicated, or the order of preference (whether the first or the
last value of a variable with the same name is to be used). We should
stick to what we have in order to maintain backward compatibility.
So, this patch:
- switches to passing env directly to exec;
- adds deduplication mechanism to retain backward compatibility;
- takes care to set PATH from process.Env in the current process
(so that supplied PATH is used to find the binary to execute),
also to retain backward compatibility;
- adds HOME to process.Env if not set;
- ensures any StartContainer CommandHook entries with no environment
set explicitly are run with the same environment as before. Thanks
to @lifubang who noticed that peculiarity.
The benchmark added by the previous commit shows ~3x improvement:
│ before │ after │
│ sec/op │ sec/op vs base │
ExecInBigEnv-20 61.53m ± 1% 21.87m ± 16% -64.46% (p=0.000 n=10)
[1]: https://github.com/opencontainers/runc/pull/1983
[2]: https://github.com/docker-archive/libcontainer/pull/418
[3]: https://github.com/docker-archive/libcontainer/pull/432
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>