Commit Graph

80 Commits

Author SHA1 Message Date
Kir Kolyshkin
002c92f1b2 libct/cg.WriteCgroupProc: use fscommon.OpenFile
...and drop os.O_CREATE|os.O_TRUNC as those are definitely not needed.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-10-05 17:08:09 -07:00
Kir Kolyshkin
e25b8cfcd5 libct/cg/utils: use fscommon.ReadFile
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-10-05 14:07:57 -07:00
Kir Kolyshkin
360981ae1d libct/cgroups: rewrite getHugePageSizeFromFilenames
This is a function to convert huge page sizes (obtained by reading
/sys/kernel/mm/hugepages directory entries) to strings user for hugetlb
cgroup controller resource files. Those strings are when used to get the
hugetlb resource statistics.

This function used external library, floating point numbers, and can
(theoretically) produce invalid values, since the kernel only uses KB,
MB, and GB suffixes.

Rewrite it to produce the same strings as used in the kernel (see [1]).
As a result, it's also faster, more future-proof (entries that do not
start with "hugepages-" and/or incorrect suffix are skipped), and does
more input sanity checks. As a side effect, libcontainer no longer
depends on docker/go-units.

While at it, add more test cases.

Before:
	BenchmarkGetHugePageSize-8       	  187452	      6265 ns/op
	BenchmarkGetHugePageSizeImpl-8   	  396769	      2998 ns/op

After:
	BenchmarkGetHugePageSize-8       	  222898	      4554 ns/op
	BenchmarkGetHugePageSizeImpl-8   	 4738924	       241 ns/op

NOTE on removing HugePageSizeUnitList -- this was added by commit
6f77e35da and was used by kubernetes code in [2], which was later
superceded by [3], so there are (hopefully) no external users.
If there are any, they should not be doing that.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/hugetlb_cgroup.c?id=eff48ddeab782e35e58ccc8853f7386bbae9dec4#n574
[2] https://github.com/kubernetes/kubernetes/pull/78495
[3] https://github.com/kubernetes/kubernetes/pull/84154

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-09-30 10:58:31 -07:00
Kir Kolyshkin
8ceae9f766 libct/cgroups/GetHugePageSize: use Readdirnames
ioutil.ReadFile does a stat() on every entry and returns a slice of
os.Stat structures. What we need here is just a file name.

This change both simplifies and speeds up the code a bit.

Before:
	BenchmarkGetHugePageSize-8       	  115213	      9400 ns/op

After:
	BenchmarkGetHugePageSize-8       	  190326	      6187 ns/op

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-09-22 16:08:56 -07:00
Kir Kolyshkin
19be8e5ba5 libct/cgroups.RemovePaths: speedup
Using os.RemoveAll has the following two issues:

 1. it tries to remove all files, which does not make sense for cgroups;
 2. it tries rm(2) which fails to directories, and then rmdir(2).

Let's reuse our RemovePath instead, and add warnings and errors logging.

PS I am somewhat hesitant to remove the weird checking my means of stat,
as it might break something. Unfortunately, neither commit 6feb7bda04
nor the PR it contains [1] do not explain what kind of weird errors were
seen from os.RemoveAll. Most probably our code won't return any bogus
errors, but let's keep the old code to be on the safe side.

[1] https://github.com/docker-archive/libcontainer/pull/308

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-07-06 17:54:44 -07:00
Kir Kolyshkin
3f14242e0a libct/cgroups: move RemovePath from fs2
This is to be used by RemovePaths.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-07-06 17:54:44 -07:00
Kir Kolyshkin
254d23b964 libc/cgroups: empty map in RemovePaths
RemovePaths() deletes elements from the paths map for paths that has
been successfully removed.

Although, it does not empty the map itself (which is needed that AFAIK
Go garbage collector does not shrink the map), but all its callers do.

Move this operation from callers to RemovePaths.

No functional change, except the old map should be garbage collected now.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-07-06 17:54:44 -07:00
Kir Kolyshkin
89516d17dd libct/cgroups/readProcsFile: ret errorr if scan failed
Not sure why but the errors from scanner were ignored. Such errors
can happen if open(2) has succeeded but the subsequent read(2) fails.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-17 12:33:01 -07:00
Kir Kolyshkin
0681d456fc libct/cgroups/utils: move cgroup v1 code to separate file
In most project, "utils" is a big mess, and this is not an exception.
Try to clean it up a bit by moving cgroup v1 specific code to a separate
source file.

There are no code changes in this commit, just moving it from one file
to another.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-16 12:45:07 -07:00
Kir Kolyshkin
7db2d3e146 libcontainer/cgroups: rm FindCgroupMountpointDir
This function is cgroupv1-specific, is only used once, and its name
is very close to the name of another function, FindCgroupMountpoint.

Inline it into the (only) caller.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-16 12:40:15 -07:00
Kir Kolyshkin
d244b4058e libct/cgroups: improve ParseCgroupFile docs
In particular, state that for cgroup v2 the result is very different.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-16 12:40:08 -07:00
Kir Kolyshkin
5785aabc13 libct/cgroups: make isSubsystemAvailable v1-specific
This function is only called from cgroupv1 code, so there is no need
for it to implement cgroupv2 stuff.

Make it v1-specific, and panic if it is called from v2 code (since this
is an internal function, the panic would mean incorrect runc code).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-16 12:40:04 -07:00
Kir Kolyshkin
142d0f2d5d libct/cgroups/utils: make FindCgroupMountpoint* v1-specific
It's bad and wrong to use these functions for any cgroupv2 code,
and there are no existing users (in runc, at least).

Make them return an error in such case.

Also, remove the cgroupv2-specific handling from
findCgroupMountpointAndRootFromReader().

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-16 12:39:58 -07:00
Kir Kolyshkin
44b75e760e libct/cgroups: separate getCgroupMountsV1
This function should not really be used for cgroupv2 code.
Currently it is used in kubernetes code, so we can't remove
the v2 case yet.

Add a TODO item to remove v2 code once kubernetes is converted
to not use it, and separate out v1 code.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-16 12:39:06 -07:00
Kir Kolyshkin
3834222d88 libct/cgroups/utils: getControllerPath return err for v2
This function is not used and were never used in any cgroupv2 code.

To have it stay that way, let it return error in case it's called
for v2.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-15 20:23:59 -07:00
Kir Kolyshkin
4189cb65f8 cgroups: remove cgroup.Resources.CpuMax
This (and the converting function) is only used by one of the four
cgroup drivers. The other three do some checking and conversion in
place, so let the fs2 do the same.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-09 17:15:38 -07:00
Kir Kolyshkin
3c6e8ac4d2 cgroupv2: set mem+swap to max if mem set to max
... and mem+swap is not explicitly set otherwise.

This ensures compatibility with cgroupv1 controller which interprets
things this way.

With this fixed, we can finally enable swap tests for cgroupv2.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-05-22 21:32:16 -07:00
Kir Kolyshkin
2db3240f35 libct/cgroups: rm GetClosestMountpointAncestor
The function GetClosestMountpointAncestor is not very efficient,
does not really belong to cgroup package, and is only used once
(from fs/cpuset.go).

Remove it, replacing with the implementation based on moby/sys/mountinfo
parser.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-05-13 17:32:06 -07:00
Kir Kolyshkin
f160352682 libct/cgroup: prep to rm GetClosestMountpointAncestor
This function is not very efficient, does not really belong to cgroup
package, and is only used once (from fs/cpuset.go).

Prepare to remove it by replacing with the implementation based on
the parser from github.com/moby/sys/mountinfo parser.

This commit is here to make sure the proposed replacement passes the
unit test.

Funny, but the unit test need to be slightly modified since it
supplies the wrong mountinfo (space as the first character, empty line
at the end).

Validated by

 $ go test -v -run Ance
 === RUN   TestGetClosestMountpointAncestor
 --- PASS: TestGetClosestMountpointAncestor (0.00s)
 PASS
 ok  	github.com/opencontainers/runc/libcontainer/cgroups	0.002s

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-05-13 16:26:16 -07:00
lifubang
a70f354680 let runc disable swap in cgroup v2
In cgroup v2, when memory and memorySwap set to the same value which is greater than zero,
runc should write zero in `memory.swap.max` to disable swap.

Signed-off-by: lifubang <lifubang@acmcoder.com>
2020-05-03 20:57:36 +08:00
Kenta Tada
e58a406b77 libcontainer: remove unneeded import
Signed-off-by: Kenta Tada <Kenta.Tada@sony.com>
2020-04-09 20:14:39 +09:00
Kir Kolyshkin
c86be8a2c1 cgroupv2: fix setting MemorySwap
The resources.MemorySwap field from OCI is memory+swap, while cgroupv2
has a separate swap limit, so subtract memory from the limit (and make
sure values are set and sane).

Make sure to set MemorySwapMax for systemd, too. Since systemd does not
have MemorySwapMax for cgroupv1, it is only needed for v2 driver.

[v2: return -1 on any negative value, add unit test]
[v3: treat any negative value other than -1 as error]

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-04-07 20:45:53 -07:00
Kir Kolyshkin
b2272b2cba libcontainer: use errors.Is() and errors.As()
Make use of errors.Is() and errors.As() where appropriate to check
the underlying error. The biggest motivation is to simplify the code.

The feature requires go 1.13 but since merging #2256 we are already
not supporting go 1.12 (which is an unsupported release anyway).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-04-02 20:34:01 -07:00
Kir Kolyshkin
c39f87a47a Revert "Merge pull request #2280 from kolyshkin/errors-unwrap"
Using errors.Unwrap() is not the best thing to do, since it returns
nil in case of an error which was not wrapped. More to say,
errors package provides more elegant ways to check for underlying
errors, such as errors.As() and errors.Is().

This reverts commit f8e138855d, reversing
changes made to 6ca9d8e6da.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-04-02 19:41:11 -07:00
Kir Kolyshkin
272c83e169 libct/cgroups: use errors.Unwrap
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-03-31 20:07:04 -07:00
Kir Kolyshkin
b45db5d3b2 libcontainer/cgroup: obsolete Get*Cgroup for v2
These functions should not be called from any code handling
the cgroup2 unified hierarchy.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-03-26 19:20:00 -07:00
Kir Kolyshkin
5542a2c77d libcontainer/cgroups: GetAllPids: optimize
1. Return earlier if there is an error.

2. Do not use filepath.Split on every entry, use info.Name() instead.

3. Make readProcsFile() accept file name as an argument, to avoid
   unnecessary file name and directory splitting and merging.

4. Skip on info.IsDir() -- this avoids an error when cgroup name is
   set to "cgroup.procs".

This is still not very good since filepath.Walk() performs an unnecessary
stat(2) on every entry, but better than before.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-03-20 12:27:36 -07:00
Akihiro Suda
aa269315a4 cgroup2: add CpuMax conversion
Fix #2243

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-03-13 02:58:39 +09:00
Akihiro Suda
64e9a97981 cgroup2: fix conversion
* TestConvertCPUSharesToCgroupV2Value(0) was returning 70369281052672, while the correct value is 0
* ConvertBlkIOToCgroupV2Value(0) was returning 32, while the correct value is 0
* ConvertBlkIOToCgroupV2Value(1000) was returning 4, while the correct value is 10000

Fix #2244
Follow-up to #2212 #2213

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-03-13 02:57:07 +09:00
Boris Popovschi
4b8134f63b Convert blkioWeight to io.weight properly
Signed-off-by: Boris Popovschi <zyqsempai@mail.ru>
2020-02-18 15:44:07 +02:00
Boris Popovschi
7c439cc6f6 Added conversion for cpu.weight v2
Signed-off-by: Boris Popovschi <zyqsempai@mail.ru>
2020-02-12 11:32:34 +02:00
Akihiro Suda
74a3fe5d1b cgroup2: do not parse /proc/cgroups
/proc/cgroups is meaningless for v2 and should be ignored.

https://github.com/torvalds/linux/blob/v5.3/Documentation/admin-guide/cgroup-v2.rst#deprecated-v1-core-features

* Now GetAllSubsystems() parses /sys/fs/cgroup/cgroup.controller, not /proc/cgroups.
  The function result also contains "pseudo" controllers: {"devices", "freezer"}.
  As it is hard to detect availability of pseudo controllers, pseudo controllers
  are always assumed to be available.

* Now IOGroupV2.Name() returns "io", not "blkio"

Fix #2155 #2156

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2019-10-28 00:00:33 +09:00
Michael Crosby
b28f58f31b Set unified mountpoint in find mnt func
This is needed for the fsv2 cgroups to work when there is a unified mountpoint.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-10-15 15:40:03 -04:00
Giuseppe Scrivano
1932917b71 libcontainer: add initial support for cgroups v2
allow to set what subsystems are used by
libcontainer/cgroups/fs.Manager.

subsystemsUnified is used on a system running with cgroups v2 unified
mode.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2019-09-05 13:02:25 +02:00
Odin Ugedal
6f77e35daf Export list of HugePageSizeUnits
This will allow others to import it instead of copying it.

Signed-off-by: Odin Ugedal <odin@ugedal.com>
2019-05-30 20:17:30 +02:00
Odin Ugedal
c6445b1c1c Add tests for GetHugePageSize
Add tests to avoid regressions

Signed-off-by: Odin Ugedal <odin@ugedal.com>
2019-05-30 17:27:32 +02:00
Odin Ugedal
273e7b74a7 Fix cgroup hugetlb size prefix for kB
The hugetlb cgroup control files (introduced here in 2012:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=abb8206cb0773)
use "KB" and not "kB"
(https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/hugetlb_cgroup.c?h=v5.0#n349).

The behavior in the kernel has not changed since the introduction, and
the current code using "kB" will therefore fail on devices with small
amounts of ram (see
https://github.com/kubernetes/kubernetes/issues/77169) running a kernel
with config flag CONFIG_HUGETLBFS=y

As seen from the code in "mem_fmt" inside hugetlb_cgroup.c, only "KB",
"MB" and "GB" are used, so the others may be removed as well.

Here is a real world example of the files inside the
"/sys/kernel/mm/hugepages/" directory:
- "hugepages-64kB"
- "hugepages-2048kB"
- "hugepages-32768kB"
- "hugepages-1048576kB"

And the corresponding cgroup files:
- "hugetlb.64KB._____"
- "hugetlb.2MB._____"
- "hugetlb.32MB._____"
- "hugetlb.1GB._____"

Signed-off-by: Odin Ugedal <odin@ugedal.com>
2019-05-29 21:52:43 +02:00
Tom Godkin
bdf3524b34 Retry adding pids to cgroups when EINVAL occurs
The kernel will sometimes return EINVAL when writing a pid to a
cgroup.procs file. It does so when the task being added still has the
state TASK_NEW.

See: https://elixir.bootlin.com/linux/v4.8/source/kernel/sched/core.c#L8286

Co-authored-by: Danail Branekov <danailster@gmail.com>

Signed-off-by: Tom Godkin <tgodkin@pivotal.io>
Signed-off-by: Danail Branekov <danailster@gmail.com>
2018-12-17 15:34:47 +00:00
JoeWrightss
769d6c4a75 Fix some typos
Signed-off-by: JoeWrightss <zhoulin.xie@daocloud.io>
2018-12-09 23:52:54 +08:00
Michael Crosby
76520a4bf0 Merge pull request #1872 from masters-of-cats/better-find-cgroup-mountpoint
Respect container's cgroup path
2018-11-16 14:06:54 -05:00
Yuanhong Peng
df3fa115f9 Add support for cgroup namespace
Cgroup namespace can be configured in `config.json` as other
namespaces. Here is an example:

```
"namespaces": [
	{
		"type": "pid"
	},
	{
		"type": "network"
	},
	{
		"type": "ipc"
	},
	{
		"type": "uts"
	},
	{
		"type": "mount"
	},
	{
		"type": "cgroup"
	}
],

```

Note that if you want to run a container which has shared cgroup ns with
another container, then it's strongly recommended that you set
proper `CgroupsPath` of both containers(the second container's cgroup
path must be the subdirectory of the first one). Or there might be
some unexpected results.

Signed-off-by: Yuanhong Peng <pengyuanhong@huawei.com>
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2018-10-31 10:51:43 -04:00
Danail Branekov
a1d5398afa Respect container's cgroup path
Respect the container's cgroup path when finding the container's
cgroup mount point, which is useful in multi-tenant environments, where
containers have their own unique cgroup mounts

Signed-off-by: Danail Branekov <danailster@gmail.com>
Signed-off-by: Oliver Stenbom <ostenbom@pivotal.io>
Signed-off-by: Giuseppe Capizzi <gcapizzi@pivotal.io>
2018-09-25 17:43:36 +01:00
Aleksa Sarai
578fe65e4f merge branch 'pr-1817'
Fix duplicate entries and missing entries in getCgroupMountsHelper
  Add test for testing cgroup mounts on bedrock linux
  Stop relying on number of subsystems for cgroups

LGTMs: @crosbymichael @cyphar
Closes #1817
2018-09-19 19:48:17 +10:00
Yan Zhu
feb90346e0 doc: fix typo
Signed-off-by: Yan Zhu <yanzhu@alauda.io>
2018-09-07 11:58:59 +08:00
Jay Kamat
a2faaa1317 Fix duplicate entries and missing entries in getCgroupMountsHelper
Signed-off-by: Jay Kamat <jaygkamat@gmail.com>
2018-07-31 20:12:18 -07:00
Daniel Dao
5ee0648bfb Stop relying on number of subsystems for cgroups
When there are complicated mount setups, there can be multiple mount
points which have the subsystem we are looking for. Instead of
counting the mountpoints, tick off subsystems until we have found them
all.

Without the 'all' flag, ignore duplicate subsystems after the first.

Signed-off-by: Daniel Dao <dqminh89@gmail.com>
2018-06-24 00:00:58 +01:00
Michael Crosby
18cd7e06f7 Merge pull request #1372 from cloudfoundry-incubator/cpuset-mount-root
Handle container creation when cgroups have already been mounted in another location
2017-05-25 09:53:57 -07:00
Aleksa Sarai
baeef29858 rootless: add rootless cgroup manager
The rootless cgroup manager acts as a noop for all set and apply
operations. It is just used for rootless setups. Currently this is far
too simple (we need to add opportunistic cgroup management), but is good
enough as a first-pass at a noop cgroup manager.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-03-23 20:46:20 +11:00
Craig Furman
f5c5aac958 Create containers when cgroups already mounted
Runc needs to copy certain files from the top of the cgroup cpuset hierarchy
into the container's cpuset cgroup directory. Currently, runc determines
which directory is the top of the hierarchy by using the parent dir of
the first entry in /proc/self/mountinfo of type cgroup.

This creates problems when cgroup subsystems are mounted arbitrarily in
different dirs on the host.

Now, we use the most deeply nested mountpoint that contains the
container's cpuset cgroup directory.

Signed-off-by: Konstantinos Karampogias <konstantinos.karampogias@swisscom.com>
Signed-off-by: Will Martin <wmartin@pivotal.io>
2017-03-15 10:10:30 +00:00
Daniel, Dao Quang Minh
0fefa36f3a Merge pull request #1278 from datawolf/scanner
move error check out of the for loop
2017-01-20 17:49:44 +00:00