runc run: warn on non-empty cgroup

Currently runc allows multiple containers to share the same cgroup (for example, by having the same cgroupPath in config.json). While such shared configuration might be OK, there are some issues: - When each container has its own resource limits, the order of containers start determines whose limits will be effectively applied. - When one of containers is paused, all others are paused, too. - When a container is paused, any attempt to do runc create/run/exec end up with runc init stuck inside a frozen cgroup. - When a systemd cgroup manager is used, this becomes even worse -- such as, stop (or even failed start) of any container results in "stopTransientUnit" command being sent to systemd, and so (depending on unit properties) other containers can receive SIGTERM, be killed after a timeout etc. Any of the above may lead to various hard-to-debug situations in production (runc init stuck, cgroup removal error, wrong resource limits, init not reaping zombies etc.). One obvious solution is to refuse a non-empty cgroup when starting a new container. This would be a breaking change though, so let's make it in steps, with the first step is issue a warning and a deprecated notice about a non-empty cgroup. Later (in runc 1.2) we will replace this warning with an error. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-10-05 23:46:57 +08:00 · 2021-08-17 16:56:12 -07:00
parent dd696235a4
commit d08bc0c1b3
2 changed files with 46 additions and 4 deletions
--- a/libcontainer/factory_linux.go
+++ b/libcontainer/factory_linux.go
@@ -160,16 +160,37 @@ func (l *LinuxFactory) Create(id string, config *configs.Config) (Container, err
 	} else if !os.IsNotExist(err) {
 		return nil, err
 	}
+
+	cm, err := manager.New(config.Cgroups)
+	if err != nil {
+		return nil, err
+	}
+
+	// Check that cgroup does not exist or empty (no processes).
+	// Note for cgroup v1 this check is not thorough, as there are multiple
+	// separate hierarchies, while both Exists() and GetAllPids() only use
+	// one for "devices" controller (assuming others are the same, which is
+	// probably true in almost all scenarios). Checking all the hierarchies
+	// would be too expensive.
+	if cm.Exists() {
+		pids, err := cm.GetAllPids()
+		// Reading PIDs can race with cgroups removal, so ignore ENOENT and ENODEV.
+		if err != nil && !errors.Is(err, os.ErrNotExist) && !errors.Is(err, unix.ENODEV) {
+			return nil, fmt.Errorf("unable to get cgroup PIDs: %w", err)
+		}
+		if len(pids) != 0 {
+			// TODO: return an error.
+			logrus.Warnf("container's cgroup is not empty: %d process(es) found", len(pids))
+			logrus.Warn("DEPRECATED: running container in a non-empty cgroup won't be supported in runc 1.2; https://github.com/opencontainers/runc/issues/3132")
+		}
+	}
+
 	if err := os.MkdirAll(containerRoot, 0o711); err != nil {
 		return nil, err
 	}
 	if err := os.Chown(containerRoot, unix.Geteuid(), unix.Getegid()); err != nil {
 		return nil, err
 	}
-	cm, err := manager.New(config.Cgroups)
-	if err != nil {
-		return nil, err
-	}
 	c := &linuxContainer{
 		id:            id,
 		root:          containerRoot,
--- a/tests/integration/cgroups.bats
+++ b/tests/integration/cgroups.bats
@@ -347,3 +347,24 @@ function setup() {
 	[ "$status" -eq 0 ]
 	[ "$output" = "ok" ]
 }
+
+@test "runc run/create should warn about a non-empty cgroup" {
+	if [[ "$ROOTLESS" -ne 0 ]]; then
+		requires rootless_cgroup
+	fi
+
+	set_cgroups_path
+
+	runc run -d --console-socket "$CONSOLE_SOCKET" ct1
+	[ "$status" -eq 0 ]
+
+	# Run a second container sharing the cgroup with the first one.
+	runc --debug run -d --console-socket "$CONSOLE_SOCKET" ct2
+	[ "$status" -eq 0 ]
+	[[ "$output" == *"container's cgroup is not empty"* ]]
+
+	# Same but using runc create.
+	runc create --console-socket "$CONSOLE_SOCKET" ct3
+	[ "$status" -eq 0 ]
+	[[ "$output" == *"container's cgroup is not empty"* ]]
+}