chown cgroup to process uid in container namespace

Delegating cgroups to the container enables more complex workloads,
including systemd-based workloads.  The OCI runtime-spec was
recently updated to explicitly admit such delegation, through
specification of cgroup ownership semantics:

  https://github.com/opencontainers/runtime-spec/pull/1123

Pursuant to the updated OCI runtime-spec, change the ownership of
the container's cgroup directory and particular files therein, when
using cgroups v2 and when the cgroupfs is to be mounted read/write.

As a result of this change, systemd workloads can run in isolated
user namespaces on OpenShift when the sandbox's cgroupfs is mounted
read/write.

It might be possible to implement this feature in other cgroup
managers, but that work is deferred.

Signed-off-by: Fraser Tweedale <ftweedal@redhat.com>
This commit is contained in:
Fraser Tweedale
2021-06-16 15:30:42 +10:00
parent 6ff042023c
commit 35d20c4e0b
4 changed files with 150 additions and 0 deletions

View File

@@ -41,6 +41,13 @@ type Cgroup struct {
// Rootless tells if rootless cgroups should be used.
Rootless bool
// The host UID that should own the cgroup, or nil to accept
// the default ownership. This should only be set when the
// cgroupfs is to be mounted read/write.
// Not all cgroup manager implementations support changing
// the ownership.
OwnerUID *int `json:"owner_uid,omitempty"`
}
type Resources struct {