Remove intelrtd.Manager interface, since we only have a single
implementation, and do not expect another one.
Rename intelRdtManager to Manager, and modify its users accordingly.
Remove NewIntelRdtManager from factory.
Remove IntelRdtfs. Instead, make intelrdt.NewManager return nil if the
feature is not available.
Remove TestFactoryNewIntelRdt as it is now identical to TestFactoryNew.
Add internal function newManager to be used for tests (to make sure
some testing is done even when the feature is not available in
kernel/hardware).
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
For the Nth time I wanted to replace parsing mountinfo with
statfs and the check for superblock magic, but it is not possible
since some code relies of mount options check which can only
be obtained via mountinfo.
Add a note about it.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
In a (quite common) case RDT is not supported by the kernel/hardware,
it does not make sense to parse /proc/cpuinfo and /proc/self/mountinfo,
and yet the current code does it (on every runc exec, for example).
Fortunately, there is a quick way to check whether RDT is available --
if so, kernel creates /sys/fs/resctrl directory. Check its existence,
and skip all the other initialization if it's not present.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This test was written back in the day when findIntelRdtMountpointDir
was using its own mountinfo parser. Commit f1c1fdf911 changed that,
and thus this test is actually testing moby/sys/mountinfo parser, which
does not make much sense.
Remove the test, and drop the io.Reader argument since we no longer need
to parse a custom file.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
In case resctrl filesystem can not be found in /proc/self/mountinfo
(which is pretty common on non-server or non-x86 hardware), subsequent
calls to Root() will result in parsing it again and again.
Use sync.Once to avoid it. Make unit tests call it so that Root() won't
actually parse mountinfo in tests.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Since commit 7296dc1712, type intelRdtData is only used by tests,
and since commit 79d292b9f, its only member is config.
Change the test to use config directly, and remove the type.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Export getIntelRdtRoot function as Root.
This is needed by google/cadvisor, which is (ab)using GetIntelRdtPath,
removed by commit 7296dc1712.
While at it, do some minimal refactoring to always use Root()
internally, not relying on variable value. Other than that it's just
some renaming.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Only some libcontainer packages can be built on non-linux platforms
(not that it make sense, but at least go build succeeds). Let's call
these "good" packages.
For all other packages (i.e. ones that fail to build with GOOS other
than linux), it does not make sense to have linux build tag (as they
are broken already, and thus are not and can not be used on anything
other than Linux).
Remove linux build tag for all non-"good" packages.
This was mostly done by the following script, with just a few manual
fixes on top.
function list_good_pkgs() {
for pkg in $(find . -type d -print); do
GOOS=freebsd go build $pkg 2>/dev/null \
&& GOOS=solaris go build $pkg 2>/dev/null \
&& echo $pkg
done | sed -e 's|^./||' | tr '\n' '|' | sed -e 's/|$//'
}
function remove_tag() {
sed -i -e '\|^// +build linux$|d' $1
go fmt $1
}
SKIP="^("$(list_good_pkgs)")"
for f in $(git ls-files . | grep .go$); do
if echo $f | grep -qE "$SKIP"; then
echo skip $f
continue
fi
echo proc $f
remove_tag $f
done
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Use the term "clos group" instead of "container_id group" as the group
that a container belongs to is not necessarily tied to its container id.
Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
Check that the ClosID directory pre-exists if no L3 or MB schema has
been specified. Conform with the following line from runtime-spec
(config-linux):
If closID is set, and neither of l3CacheSchema and memBwSchema are
set, runtime MUST check if corresponding pre-configured directory
closID is present in mounted resctrl. If such pre-configured directory
closID exists, runtime MUST assign container to this closID and
generate an error if directory does not exist.
Add a TODO note for verifying existing schemata against L3/MB
parameters.
Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
Handle ClosID parameter of IntelRdt. Makes it possible to use
pre-configured classes/ClosIDs and avoid running out of available IDs
which easily happens with per-container classes.
Remove validator checks for empty L3CacheSchema and MemBwSchema fields
in order to be able to leave them empty, and only specify ClosID for
a pre-configured class.
Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
These are not used anywhere outside of the package
(I have also checked the only external user of the package
(github.com/google/cadvisor).
No changes other than changing the case. The following
identifiers are now private:
* IntelRdtTasks
* NewLastCmdError
* NewStats
Brought to you by gorename.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
... the stack, so every caller will automatically benefit from it.
The only change that it causes is the user in
libcontainer/process_linux.go will get a better error message.
[v2: typo fix]
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
For errors that only have a string and an underlying error, using
fmt.Errorf with %w to wrap an error is sufficient.
In this particular case, the code is simplified, and now we have
unwrappable errors as a bonus (same could be achieved by adding
(*LastCmdError).Unwrap() method, but that's adding more code).
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Initially, this was copied over from libcontainer/cgroups, where it made
sense as for cgroup v1 we have multiple controllers and mount points.
Here, we only have a single mount, so there's no need for the whole
type.
Replace all that with a simple error (which is currently internal since
the only user is our own test case).
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
In case getIntelRdtData() returns an error, d is set to nil.
In case the error returned is of NotFoundError type (which happens
if resctlr mount is not found in /proc/self/mountinfo), the function
proceeds to call d.join(), resulting in a nil deref and a panic.
In practice, this never happens in runc because of the checks in
intelrdt() function in libcontainer/configs/validate, which raises
an error in case any of the parameters are set in config but
the IntelRTD itself is not available (that includes checking
that the mount point is there).
Nevertheless, the code is wrong, and can result in nil dereference
if some external users uses Apply on a system without resctrl mount.
Fix this by removing the exclusion.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Do this for all errors except one from unix.*.
This fixes a bunch of errorlint warnings, like these
libcontainer/generic_error.go:25:15: type assertion on error will fail on wrapped errors. Use errors.As to check for specific errors (errorlint)
if le, ok := err.(Error); ok {
^
libcontainer/factory_linux_test.go:145:14: type assertion on error will fail on wrapped errors. Use errors.As to check for specific errors (errorlint)
lerr, ok := err.(Error)
^
libcontainer/state_linux_test.go:28:11: type assertion on error will fail on wrapped errors. Use errors.As to check for specific errors (errorlint)
_, ok := err.(*stateTransitionError)
^
libcontainer/seccomp/patchbpf/enosys_linux.go:88:4: switch on an error will fail on wrapped errors. Use errors.Is to check for specific errors (errorlint)
switch err {
^
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This one is tough as errorlint insists on using errors.Is, and the
latter is known to not work for Go 1.13 which we still support.
So, add a nolint annotation to suppress the warning, and a TODO to
address it later.
For intelrdt, we can do the same, but it is easier to reuse the very
same function from fscommon (note we can't use fscommon for other stuff
as it expects cgroupfs).
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This should result in no change when the error is printed, but make the
errors returned unwrappable, meaning errors.As and errors.Is will work.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
An errror from ioutil.WriteFile already contains file name, so there is
no need to duplicate that information.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This commit adjusts the file mode to use the latest golang style
and also changes the file mode value in accordance with default.
Signed-off-by: Kenta Tada <Kenta.Tada@sony.com>
Use sync.Once to init Intel RDT when needed for a small speedup to
operations which do not require Intel RDT.
Simplify IntelRdtManager initialization in LinuxFactory.
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Intel RDT sub-features can be selectively disabled or enabled by kernel
command line. See "rdt=" option details in kernel document:
https://www.kernel.org/doc/Documentation/admin-guide/kernel-parameters.txt
But Cache Monitoring Technology (CMT) feature is not correctly checked
in init() and getCMTNumaNodeStats() now. If CMT is disabled by kernel
command line (e.g., rdt=!cmt,mbmtotal,mbmlocal,l3cat,mba) while hardware
supports CMT, we may get following error when getting Intel RDT stats:
runc run c1
runc events c1
ERRO[0005] container_linux.go:200: getting container's Intel RDT stats
caused: open /sys/fs/resctrl/c1/mon_data/mon_L3_00/llc_occupancy: no
such file or directory
Fix CMT feature check in init() and GetStats() call paths.
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
It might be a tad slower but it surely more correct and well maintained,
so it's better to use it than rely on a custom implementation which is
kind of hard to get entirely right.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
The line we are parsing looks like this
> flags : fpu vme de pse <...>
so look for "flags" as a prefix, not substring.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
The Err() method should be called after the Scan() loop, not inside it.
Found by
git grep -A3 -F '.Scan()' | grep Err
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This patch fixes a corner case when destroy a container:
If we start a container without 'intelRdt' config set, and then we run
“runc update --l3-cache-schema/--mem-bw-schema” to add 'intelRdt' config
implicitly.
Now if we enter "exit" from the container inside, we will pass through
linuxContainer.Destroy() -> state.destroy() -> intelRdtManager.Destroy().
But in IntelRdtManager.Destroy(), IntelRdtManager.Path is still null
string, it hasn’t been initialized yet. As a result, the created rdt
group directory during "runc update" will not be removed as expected.
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
MBA Software Controller feature is introduced in Linux kernel v4.18.
It is a software enhancement to mitigate some limitations in MBA which
describes in kernel documentation. It also makes the interface more user
friendly - we could specify memory bandwidth in "MBps" (Mega Bytes per
second) as well as in "percentages".
The kernel underneath would use a software feedback mechanism or a
"Software Controller" which reads the actual bandwidth using MBM
counters and adjust the memory bandwidth percentages to ensure:
"actual memory bandwidth < user specified memory bandwidth".
We could enable this feature through mount option "-o mba_MBps":
mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl
In runc, we handle both memory bandwidth schemata in unified format:
"MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."
The unit of memory bandwidth is specified in "percentages" by default,
and in "MBps" if MBA Software Controller is enabled.
For more information about Intel RDT and MBA Software Controller:
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Linux kernel v4.15 introduces better diagnostics for Intel RDT operation
errors. If any error returns when making new directories or writing to
any of the control file in resctrl filesystem, reading file
/sys/fs/resctrl/info/last_cmd_status could provide more information that
can be conveyed in the error returns from file operations.
Some examples:
echo "L3:0=f3;1=ff" > /sys/fs/resctrl/container_id/schemata
-bash: echo: write error: Invalid argument
cat /sys/fs/resctrl/info/last_cmd_status
mask f3 has non-consecutive 1-bits
echo "MB:0=0;1=110" > /sys/fs/resctrl/container_id/schemata
-bash: echo: write error: Invalid argument
cat /sys/fs/resctrl/info/last_cmd_status
MB value 0 out of range [10,100]
cd /sys/fs/resctrl
mkdir 1 2 3 4 5 6 7 8
mkdir: cannot create directory '8': No space left on device
cat /sys/fs/resctrl/info/last_cmd_status
out of CLOSIDs
See 'last_cmd_status' for more details in kernel documentation:
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt
In runc, we could append the diagnostics information to the error
message of Intel RDT operation errors to provide more user-friendly
information.
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Double check if Intel RDT sub-features are available in "resource
control" filesystem. Intel RDT sub-features can be selectively disabled
or enabled by kernel command line (e.g., rdt=!l3cat,mba) in 4.14 and
newer kernel.
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Memory Bandwidth Allocation (MBA) is a resource allocation sub-feature
of Intel Resource Director Technology (RDT) which is supported on some
Intel Xeon platforms. Intel RDT/MBA provides indirect and approximate
throttle over memory bandwidth for the software. A user controls the
resource by indicating the percentage of maximum memory bandwidth.
Hardware details of Intel RDT/MBA can be found in section 17.18 of
Intel Software Developer Manual:
https://software.intel.com/en-us/articles/intel-sdm
In Linux 4.12 kernel and newer, Intel RDT/MBA is enabled by kernel
config CONFIG_INTEL_RDT. If hardware support, CPU flags `rdt_a` and
`mba` will be set in /proc/cpuinfo.
Intel RDT "resource control" filesystem hierarchy:
mount -t resctrl resctrl /sys/fs/resctrl
tree /sys/fs/resctrl
/sys/fs/resctrl/
|-- info
| |-- L3
| | |-- cbm_mask
| | |-- min_cbm_bits
| | |-- num_closids
| |-- MB
| |-- bandwidth_gran
| |-- delay_linear
| |-- min_bandwidth
| |-- num_closids
|-- ...
|-- schemata
|-- tasks
|-- <container_id>
|-- ...
|-- schemata
|-- tasks
For MBA support for `runc`, we will reuse the infrastructure and code
base of Intel RDT/CAT which implemented in #1279. We could also make
use of `tasks` and `schemata` configuration for memory bandwidth
resource constraints.
The file `tasks` has a list of tasks that belongs to this group (e.g.,
<container_id>" group). Tasks can be added to a group by writing the
task ID to the "tasks" file (which will automatically remove them from
the previous group to which they belonged). New tasks created by
fork(2) and clone(2) are added to the same group as their parent.
The file `schemata` has a list of all the resources available to this
group. Each resource (L3 cache, memory bandwidth) has its own line and
format.
Memory bandwidth schema:
It has allocation values for memory bandwidth on each socket, which
contains L3 cache id and memory bandwidth percentage.
Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."
The minimum bandwidth percentage value for each CPU model is predefined
and can be looked up through "info/MB/min_bandwidth". The bandwidth
granularity that is allocated is also dependent on the CPU model and
can be looked up at "info/MB/bandwidth_gran". The available bandwidth
control steps are: min_bw + N * bw_gran. Intermediate values are
rounded to the next control step available on the hardware.
For more information about Intel RDT kernel interface:
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt
An example for runc:
Consider a two-socket machine with two L3 caches where the minimum
memory bandwidth of 10% with a memory bandwidth granularity of 10%.
Tasks inside the container may use a maximum memory bandwidth of 20%
on socket 0 and 70% on socket 1.
"linux": {
"intelRdt": {
"memBwSchema": "MB:0=20;1=70"
}
}
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
This fixes a GetStats() issue introduced in #1590:
If Intel RDT is enabled by hardware and kernel, but intelRdt is not
specified in original config, GetStats() will return error unexpectedly
because we haven't called Apply() to create intelrdt group or attach
tasks for this container. As a result, runc events command will have no
output.
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>