Commit Graph

93 Commits

Author SHA1 Message Date
Kir Kolyshkin
56e478046a *: ignore errorlint warnings about unix.* errors
Errors from unix.* are always bare and thus can be used directly.

Add //nolint:errorlint annotation to ignore errors such as these:

libcontainer/system/xattrs_linux.go:18:7: comparing with == will fail on wrapped errors. Use errors.Is to check for a specific error (errorlint)
	case errno == unix.ERANGE:
	     ^
libcontainer/container_linux.go:1259:9: comparing with != will fail on wrapped errors. Use errors.Is to check for a specific error (errorlint)
					if e != unix.EINVAL {
					   ^
libcontainer/rootfs_linux.go:919:7: comparing with != will fail on wrapped errors. Use errors.Is to check for a specific error (errorlint)
			if err != unix.EINVAL && err != unix.EPERM {
			   ^
libcontainer/rootfs_linux.go:1002:4: switch on an error will fail on wrapped errors. Use errors.Is to check for specific errors (errorlint)
			switch err {
			^

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-06-22 16:09:47 -07:00
Kir Kolyshkin
7be93a66b9 *: fmt.Errorf: use %w when appropriate
This should result in no change when the error is printed, but make the
errors returned unwrappable, meaning errors.As and errors.Is will work.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-06-22 16:09:47 -07:00
Kir Kolyshkin
8f1b4d4a6f libct/cg: mv fscommon.{Open,Read,Write}File to cgroups
This is a better place as cgroups itself is using these.
Should help with moving more stuff common in between fs and fs2 to
fscommon.

Looks big, but this is just moving the code around:

 fscommon/{fscommon,open}.go -> cgroups/file.go
 fscommon/fscommon_test.go   -> cgroups/file_test.go

and fixes for TestMode moved to a different package.

There's no functional change.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-06-13 12:38:21 -07:00
Kir Kolyshkin
e6048715e4 Use gofumpt to format code
gofumpt (mvdan.cc/gofumpt) is a fork of gofmt with stricter rules.

Brought to you by

	git ls-files \*.go | grep -v ^vendor/ | xargs gofumpt -s -w

Looking at the diff, all these changes make sense.

Also, replace gofmt with gofumpt in golangci.yml.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-06-01 12:17:27 -07:00
Sebastiaan van Stijn
4316df8b53 libcontainer/system: move userns utilities to separate package
Moving these utilities to a separate package, so that consumers of this
package don't have to pull in the whole "system" package.

Looking at uses of these utilities (outside of runc itself);

`RunningInUserNS()` is used by [various external consumers][1],
so adding a "Deprecated" alias for this.

[1]: https://grep.app/search?current=2&q=.RunningInUserNS

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-04-04 22:42:03 +02:00
Daniel Dao
8c7ece1e6d fs2: fallback to setting io.weight if io.bfq.weight
if bfq is not loaded, then io.bfq.weight is not available. io.weight
should always be available and is the next best equivalent thing.

Signed-off-by: Daniel Dao <dqminh89@gmail.com>
2021-03-05 13:55:36 +00:00
Daniel Dao
c3ffd2ef81 Do not convert blkio weight value using blkio->io conversion scheme
bfq weight controller (i.e. io.bfq.weight if present) is still using the
same bfq weight scheme (i.e 1->1000, see [1].) Unfortunately the
documentation for this was wrong, and only fixed recently [2].

Therefore, if we map blkio weight to io.bfq.weight, there's no need to
do any conversion. Otherwise, we will try to write invalid value which
results in error such as:

```
time="2021-02-03T14:55:30Z" level=error msg="container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: process_linux.go:458: setting cgroup config for procHooks process caused: failed to write \"7475\": write /sys/fs/cgroup/runc-cgroups-integration-test/test-cgroup/io.bfq.weight: numerical result out of range"
```

[1] https://github.com/torvalds/linux/blob/master/Documentation/block/bfq-iosched.rst
[2] 65752aef0a

Signed-off-by: Daniel Dao <dqminh89@gmail.com>
2021-02-23 19:46:16 -08:00
Kir Kolyshkin
a99ecc9ea2 libct/cg/utils: silence a linter warning
> libcontainer/cgroups/utils.go:282:4: SA4006: this value of `paths` is never used (staticcheck)
>			paths = make(map[string]string)

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-12-03 10:24:27 -08:00
Aleksa Sarai
8d860c69ad merge branch 'pr-2634'
Cory Bennett (1):
  don't panic when /sys/fs/cgroup is missing for rootless

LGTMs: @AkihiroSuda @cyphar
Closes #2634
2020-10-29 15:59:01 +11:00
Mrunal Patel
10e5ab7966 Merge pull request #2635 from kolyshkin/fscommon-III
libct/cg: introduce and use fscommon.OpenFile
2020-10-22 20:59:56 -07:00
Cory Bennett
939ad4e3fc don't panic when /sys/fs/cgroup is missing for rootless
Signed-off-by: Cory Bennett <cbennett@netflix.com>
2020-10-15 15:52:19 +00:00
Akihiro Suda
bb539a9965 Merge pull request #2628 from thaJeztah/linting_foo
fix some linting issues
2020-10-06 20:10:40 +09:00
Kir Kolyshkin
002c92f1b2 libct/cg.WriteCgroupProc: use fscommon.OpenFile
...and drop os.O_CREATE|os.O_TRUNC as those are definitely not needed.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-10-05 17:08:09 -07:00
Kir Kolyshkin
e25b8cfcd5 libct/cg/utils: use fscommon.ReadFile
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-10-05 14:07:57 -07:00
Sebastiaan van Stijn
e8eb8000f1 fix some linting issues
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-10-02 10:21:54 +02:00
Kir Kolyshkin
360981ae1d libct/cgroups: rewrite getHugePageSizeFromFilenames
This is a function to convert huge page sizes (obtained by reading
/sys/kernel/mm/hugepages directory entries) to strings user for hugetlb
cgroup controller resource files. Those strings are when used to get the
hugetlb resource statistics.

This function used external library, floating point numbers, and can
(theoretically) produce invalid values, since the kernel only uses KB,
MB, and GB suffixes.

Rewrite it to produce the same strings as used in the kernel (see [1]).
As a result, it's also faster, more future-proof (entries that do not
start with "hugepages-" and/or incorrect suffix are skipped), and does
more input sanity checks. As a side effect, libcontainer no longer
depends on docker/go-units.

While at it, add more test cases.

Before:
	BenchmarkGetHugePageSize-8       	  187452	      6265 ns/op
	BenchmarkGetHugePageSizeImpl-8   	  396769	      2998 ns/op

After:
	BenchmarkGetHugePageSize-8       	  222898	      4554 ns/op
	BenchmarkGetHugePageSizeImpl-8   	 4738924	       241 ns/op

NOTE on removing HugePageSizeUnitList -- this was added by commit
6f77e35da and was used by kubernetes code in [2], which was later
superceded by [3], so there are (hopefully) no external users.
If there are any, they should not be doing that.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/hugetlb_cgroup.c?id=eff48ddeab782e35e58ccc8853f7386bbae9dec4#n574
[2] https://github.com/kubernetes/kubernetes/pull/78495
[3] https://github.com/kubernetes/kubernetes/pull/84154

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-09-30 10:58:31 -07:00
Kir Kolyshkin
8ceae9f766 libct/cgroups/GetHugePageSize: use Readdirnames
ioutil.ReadFile does a stat() on every entry and returns a slice of
os.Stat structures. What we need here is just a file name.

This change both simplifies and speeds up the code a bit.

Before:
	BenchmarkGetHugePageSize-8       	  115213	      9400 ns/op

After:
	BenchmarkGetHugePageSize-8       	  190326	      6187 ns/op

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-09-22 16:08:56 -07:00
Kir Kolyshkin
19be8e5ba5 libct/cgroups.RemovePaths: speedup
Using os.RemoveAll has the following two issues:

 1. it tries to remove all files, which does not make sense for cgroups;
 2. it tries rm(2) which fails to directories, and then rmdir(2).

Let's reuse our RemovePath instead, and add warnings and errors logging.

PS I am somewhat hesitant to remove the weird checking my means of stat,
as it might break something. Unfortunately, neither commit 6feb7bda04
nor the PR it contains [1] do not explain what kind of weird errors were
seen from os.RemoveAll. Most probably our code won't return any bogus
errors, but let's keep the old code to be on the safe side.

[1] https://github.com/docker-archive/libcontainer/pull/308

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-07-06 17:54:44 -07:00
Kir Kolyshkin
3f14242e0a libct/cgroups: move RemovePath from fs2
This is to be used by RemovePaths.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-07-06 17:54:44 -07:00
Kir Kolyshkin
254d23b964 libc/cgroups: empty map in RemovePaths
RemovePaths() deletes elements from the paths map for paths that has
been successfully removed.

Although, it does not empty the map itself (which is needed that AFAIK
Go garbage collector does not shrink the map), but all its callers do.

Move this operation from callers to RemovePaths.

No functional change, except the old map should be garbage collected now.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-07-06 17:54:44 -07:00
Kir Kolyshkin
89516d17dd libct/cgroups/readProcsFile: ret errorr if scan failed
Not sure why but the errors from scanner were ignored. Such errors
can happen if open(2) has succeeded but the subsequent read(2) fails.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-17 12:33:01 -07:00
Kir Kolyshkin
0681d456fc libct/cgroups/utils: move cgroup v1 code to separate file
In most project, "utils" is a big mess, and this is not an exception.
Try to clean it up a bit by moving cgroup v1 specific code to a separate
source file.

There are no code changes in this commit, just moving it from one file
to another.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-16 12:45:07 -07:00
Kir Kolyshkin
7db2d3e146 libcontainer/cgroups: rm FindCgroupMountpointDir
This function is cgroupv1-specific, is only used once, and its name
is very close to the name of another function, FindCgroupMountpoint.

Inline it into the (only) caller.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-16 12:40:15 -07:00
Kir Kolyshkin
d244b4058e libct/cgroups: improve ParseCgroupFile docs
In particular, state that for cgroup v2 the result is very different.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-16 12:40:08 -07:00
Kir Kolyshkin
5785aabc13 libct/cgroups: make isSubsystemAvailable v1-specific
This function is only called from cgroupv1 code, so there is no need
for it to implement cgroupv2 stuff.

Make it v1-specific, and panic if it is called from v2 code (since this
is an internal function, the panic would mean incorrect runc code).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-16 12:40:04 -07:00
Kir Kolyshkin
142d0f2d5d libct/cgroups/utils: make FindCgroupMountpoint* v1-specific
It's bad and wrong to use these functions for any cgroupv2 code,
and there are no existing users (in runc, at least).

Make them return an error in such case.

Also, remove the cgroupv2-specific handling from
findCgroupMountpointAndRootFromReader().

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-16 12:39:58 -07:00
Kir Kolyshkin
44b75e760e libct/cgroups: separate getCgroupMountsV1
This function should not really be used for cgroupv2 code.
Currently it is used in kubernetes code, so we can't remove
the v2 case yet.

Add a TODO item to remove v2 code once kubernetes is converted
to not use it, and separate out v1 code.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-16 12:39:06 -07:00
Kir Kolyshkin
3834222d88 libct/cgroups/utils: getControllerPath return err for v2
This function is not used and were never used in any cgroupv2 code.

To have it stay that way, let it return error in case it's called
for v2.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-15 20:23:59 -07:00
Kir Kolyshkin
4189cb65f8 cgroups: remove cgroup.Resources.CpuMax
This (and the converting function) is only used by one of the four
cgroup drivers. The other three do some checking and conversion in
place, so let the fs2 do the same.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-09 17:15:38 -07:00
Kir Kolyshkin
3c6e8ac4d2 cgroupv2: set mem+swap to max if mem set to max
... and mem+swap is not explicitly set otherwise.

This ensures compatibility with cgroupv1 controller which interprets
things this way.

With this fixed, we can finally enable swap tests for cgroupv2.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-05-22 21:32:16 -07:00
Kir Kolyshkin
2db3240f35 libct/cgroups: rm GetClosestMountpointAncestor
The function GetClosestMountpointAncestor is not very efficient,
does not really belong to cgroup package, and is only used once
(from fs/cpuset.go).

Remove it, replacing with the implementation based on moby/sys/mountinfo
parser.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-05-13 17:32:06 -07:00
Kir Kolyshkin
f160352682 libct/cgroup: prep to rm GetClosestMountpointAncestor
This function is not very efficient, does not really belong to cgroup
package, and is only used once (from fs/cpuset.go).

Prepare to remove it by replacing with the implementation based on
the parser from github.com/moby/sys/mountinfo parser.

This commit is here to make sure the proposed replacement passes the
unit test.

Funny, but the unit test need to be slightly modified since it
supplies the wrong mountinfo (space as the first character, empty line
at the end).

Validated by

 $ go test -v -run Ance
 === RUN   TestGetClosestMountpointAncestor
 --- PASS: TestGetClosestMountpointAncestor (0.00s)
 PASS
 ok  	github.com/opencontainers/runc/libcontainer/cgroups	0.002s

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-05-13 16:26:16 -07:00
lifubang
a70f354680 let runc disable swap in cgroup v2
In cgroup v2, when memory and memorySwap set to the same value which is greater than zero,
runc should write zero in `memory.swap.max` to disable swap.

Signed-off-by: lifubang <lifubang@acmcoder.com>
2020-05-03 20:57:36 +08:00
Kenta Tada
e58a406b77 libcontainer: remove unneeded import
Signed-off-by: Kenta Tada <Kenta.Tada@sony.com>
2020-04-09 20:14:39 +09:00
Kir Kolyshkin
c86be8a2c1 cgroupv2: fix setting MemorySwap
The resources.MemorySwap field from OCI is memory+swap, while cgroupv2
has a separate swap limit, so subtract memory from the limit (and make
sure values are set and sane).

Make sure to set MemorySwapMax for systemd, too. Since systemd does not
have MemorySwapMax for cgroupv1, it is only needed for v2 driver.

[v2: return -1 on any negative value, add unit test]
[v3: treat any negative value other than -1 as error]

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-04-07 20:45:53 -07:00
Kir Kolyshkin
b2272b2cba libcontainer: use errors.Is() and errors.As()
Make use of errors.Is() and errors.As() where appropriate to check
the underlying error. The biggest motivation is to simplify the code.

The feature requires go 1.13 but since merging #2256 we are already
not supporting go 1.12 (which is an unsupported release anyway).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-04-02 20:34:01 -07:00
Kir Kolyshkin
c39f87a47a Revert "Merge pull request #2280 from kolyshkin/errors-unwrap"
Using errors.Unwrap() is not the best thing to do, since it returns
nil in case of an error which was not wrapped. More to say,
errors package provides more elegant ways to check for underlying
errors, such as errors.As() and errors.Is().

This reverts commit f8e138855d, reversing
changes made to 6ca9d8e6da.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-04-02 19:41:11 -07:00
Kir Kolyshkin
272c83e169 libct/cgroups: use errors.Unwrap
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-03-31 20:07:04 -07:00
Kir Kolyshkin
b45db5d3b2 libcontainer/cgroup: obsolete Get*Cgroup for v2
These functions should not be called from any code handling
the cgroup2 unified hierarchy.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-03-26 19:20:00 -07:00
Kir Kolyshkin
5542a2c77d libcontainer/cgroups: GetAllPids: optimize
1. Return earlier if there is an error.

2. Do not use filepath.Split on every entry, use info.Name() instead.

3. Make readProcsFile() accept file name as an argument, to avoid
   unnecessary file name and directory splitting and merging.

4. Skip on info.IsDir() -- this avoids an error when cgroup name is
   set to "cgroup.procs".

This is still not very good since filepath.Walk() performs an unnecessary
stat(2) on every entry, but better than before.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-03-20 12:27:36 -07:00
Akihiro Suda
aa269315a4 cgroup2: add CpuMax conversion
Fix #2243

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-03-13 02:58:39 +09:00
Akihiro Suda
64e9a97981 cgroup2: fix conversion
* TestConvertCPUSharesToCgroupV2Value(0) was returning 70369281052672, while the correct value is 0
* ConvertBlkIOToCgroupV2Value(0) was returning 32, while the correct value is 0
* ConvertBlkIOToCgroupV2Value(1000) was returning 4, while the correct value is 10000

Fix #2244
Follow-up to #2212 #2213

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-03-13 02:57:07 +09:00
Boris Popovschi
4b8134f63b Convert blkioWeight to io.weight properly
Signed-off-by: Boris Popovschi <zyqsempai@mail.ru>
2020-02-18 15:44:07 +02:00
Boris Popovschi
7c439cc6f6 Added conversion for cpu.weight v2
Signed-off-by: Boris Popovschi <zyqsempai@mail.ru>
2020-02-12 11:32:34 +02:00
Akihiro Suda
74a3fe5d1b cgroup2: do not parse /proc/cgroups
/proc/cgroups is meaningless for v2 and should be ignored.

https://github.com/torvalds/linux/blob/v5.3/Documentation/admin-guide/cgroup-v2.rst#deprecated-v1-core-features

* Now GetAllSubsystems() parses /sys/fs/cgroup/cgroup.controller, not /proc/cgroups.
  The function result also contains "pseudo" controllers: {"devices", "freezer"}.
  As it is hard to detect availability of pseudo controllers, pseudo controllers
  are always assumed to be available.

* Now IOGroupV2.Name() returns "io", not "blkio"

Fix #2155 #2156

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2019-10-28 00:00:33 +09:00
Michael Crosby
b28f58f31b Set unified mountpoint in find mnt func
This is needed for the fsv2 cgroups to work when there is a unified mountpoint.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-10-15 15:40:03 -04:00
Giuseppe Scrivano
1932917b71 libcontainer: add initial support for cgroups v2
allow to set what subsystems are used by
libcontainer/cgroups/fs.Manager.

subsystemsUnified is used on a system running with cgroups v2 unified
mode.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2019-09-05 13:02:25 +02:00
Odin Ugedal
6f77e35daf Export list of HugePageSizeUnits
This will allow others to import it instead of copying it.

Signed-off-by: Odin Ugedal <odin@ugedal.com>
2019-05-30 20:17:30 +02:00
Odin Ugedal
c6445b1c1c Add tests for GetHugePageSize
Add tests to avoid regressions

Signed-off-by: Odin Ugedal <odin@ugedal.com>
2019-05-30 17:27:32 +02:00
Odin Ugedal
273e7b74a7 Fix cgroup hugetlb size prefix for kB
The hugetlb cgroup control files (introduced here in 2012:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=abb8206cb0773)
use "KB" and not "kB"
(https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/hugetlb_cgroup.c?h=v5.0#n349).

The behavior in the kernel has not changed since the introduction, and
the current code using "kB" will therefore fail on devices with small
amounts of ram (see
https://github.com/kubernetes/kubernetes/issues/77169) running a kernel
with config flag CONFIG_HUGETLBFS=y

As seen from the code in "mem_fmt" inside hugetlb_cgroup.c, only "KB",
"MB" and "GB" are used, so the others may be removed as well.

Here is a real world example of the files inside the
"/sys/kernel/mm/hugepages/" directory:
- "hugepages-64kB"
- "hugepages-2048kB"
- "hugepages-32768kB"
- "hugepages-1048576kB"

And the corresponding cgroup files:
- "hugetlb.64KB._____"
- "hugetlb.2MB._____"
- "hugetlb.32MB._____"
- "hugetlb.1GB._____"

Signed-off-by: Odin Ugedal <odin@ugedal.com>
2019-05-29 21:52:43 +02:00