runc run: warn on non-empty cgroup

Currently runc allows multiple containers to share the same cgroup (for
example, by having the same cgroupPath in config.json). While such
shared configuration might be OK, there are some issues:

 - When each container has its own resource limits, the order of
   containers start determines whose limits will be effectively applied.

 - When one of containers is paused, all others are paused, too.

 - When a container is paused, any attempt to do runc create/run/exec
   end up with runc init stuck inside a frozen cgroup.

 - When a systemd cgroup manager is used, this becomes even worse -- such
   as, stop (or even failed start) of any container results in
   "stopTransientUnit" command being sent to systemd, and so (depending on
   unit properties) other containers can receive SIGTERM, be killed after a
   timeout etc.

Any of the above may lead to various hard-to-debug situations in production
(runc init stuck, cgroup removal error, wrong resource limits, init not
reaping zombies etc.).

One obvious solution is to refuse a non-empty cgroup when starting a new
container. This would be a breaking change though, so let's make it in
steps, with the first step is issue a warning and a deprecated notice
about a non-empty cgroup.

Later (in runc 1.2) we will replace this warning with an error.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This commit is contained in:
Kir Kolyshkin
2021-08-17 16:56:12 -07:00
parent dd696235a4
commit d08bc0c1b3
2 changed files with 46 additions and 4 deletions

View File

@@ -160,16 +160,37 @@ func (l *LinuxFactory) Create(id string, config *configs.Config) (Container, err
} else if !os.IsNotExist(err) {
return nil, err
}
cm, err := manager.New(config.Cgroups)
if err != nil {
return nil, err
}
// Check that cgroup does not exist or empty (no processes).
// Note for cgroup v1 this check is not thorough, as there are multiple
// separate hierarchies, while both Exists() and GetAllPids() only use
// one for "devices" controller (assuming others are the same, which is
// probably true in almost all scenarios). Checking all the hierarchies
// would be too expensive.
if cm.Exists() {
pids, err := cm.GetAllPids()
// Reading PIDs can race with cgroups removal, so ignore ENOENT and ENODEV.
if err != nil && !errors.Is(err, os.ErrNotExist) && !errors.Is(err, unix.ENODEV) {
return nil, fmt.Errorf("unable to get cgroup PIDs: %w", err)
}
if len(pids) != 0 {
// TODO: return an error.
logrus.Warnf("container's cgroup is not empty: %d process(es) found", len(pids))
logrus.Warn("DEPRECATED: running container in a non-empty cgroup won't be supported in runc 1.2; https://github.com/opencontainers/runc/issues/3132")
}
}
if err := os.MkdirAll(containerRoot, 0o711); err != nil {
return nil, err
}
if err := os.Chown(containerRoot, unix.Geteuid(), unix.Getegid()); err != nil {
return nil, err
}
cm, err := manager.New(config.Cgroups)
if err != nil {
return nil, err
}
c := &linuxContainer{
id: id,
root: containerRoot,

View File

@@ -347,3 +347,24 @@ function setup() {
[ "$status" -eq 0 ]
[ "$output" = "ok" ]
}
@test "runc run/create should warn about a non-empty cgroup" {
if [[ "$ROOTLESS" -ne 0 ]]; then
requires rootless_cgroup
fi
set_cgroups_path
runc run -d --console-socket "$CONSOLE_SOCKET" ct1
[ "$status" -eq 0 ]
# Run a second container sharing the cgroup with the first one.
runc --debug run -d --console-socket "$CONSOLE_SOCKET" ct2
[ "$status" -eq 0 ]
[[ "$output" == *"container's cgroup is not empty"* ]]
# Same but using runc create.
runc create --console-socket "$CONSOLE_SOCKET" ct3
[ "$status" -eq 0 ]
[[ "$output" == *"container's cgroup is not empty"* ]]
}