The OCI runtime spec mandates "[i]f intelRdt is not set, the runtime
MUST NOT manipulate any resctrl pseudo-filesystems." Attempting to
delete files counts as manipulating, so stop doing that when the
container's RDT configuration is nil.
Signed-off-by: Cory Snider <csnider@mirantis.com>
The intelrdt package only needs to parse mountinfo to find the mount
point of the resctrl filesystem. Users are generally going to mount the
resctrl filesystem to the pre-created /sys/fs/resctrl directory, so
there is a common case where mountinfo parsing is not required. Optimize
for the common case with a fast path which checks both for the existence
of the /sys/fs/resctrl directory and whether the resctrl filesystem was
mounted to that path using a single statfs syscall.
Signed-off-by: Cory Snider <csnider@mirantis.com>
Reading /proc/cpuinfo is a surprisingly expensive operation. Since
kernel version 4.12 [1], opening /proc/cpuinfo on an x86 system can
block for around 20 milliseconds while the kernel samples the current
CPU frequency. There is a very recent patch [2] which gets rid of the
delay, but has yet to make it into the mainline kenel. Regardless,
kernels for which opening /proc/cpuinfo takes 20ms will continue to be
run in production for years to come. libcontainer only opens
/proc/cpuinfo to read the processor feature flags so all the delays to
get an accurate snapshot of the CPU frequency are just wasted time.
If we wanted to, we could interrogate the CPU features directly from
userspace using the `CPUID` instruction. However, Intel and AMD CPUs
have flags in different positions for their analogous sub-features and
there are CPU quirks [3] which would need to be accounted for. Some
Haswell server CPUs support RDT/CAT but are missing the `CPUID` flags
advertising their support; the kernel checks for support on that
processor family by probing the the hardware using privileged
RDMSR/WRMSR instructions [4]. This sort of probing could not be
implemented in userspace so it would not be possible to check for RDT
feature support in userspace without false negatives on some hardware
configurations.
It looks like libcontainer reads the CPU feature flags as a kind of
optimization so that it can skip checking whether the kernel supports an
RDT sub-feature if the hardware support is missing. As the kernel only
exposes subtrees in the `resctrl` filesystem for RDT sub-features with
hardware and kernel support, checking the CPU feature flags is redundant
from a correctness point of view. Remove the /proc/cpuinfo check as it
is an optimization which actually hurts performance.
[1]: https://unix.stackexchange.com/a/526679
[2]: https://lore.kernel.org/all/20220415161206.875029458@linutronix.de/
[3]: 7cf6a8a17f/arch/x86/kernel/cpu/resctrl/core.c (L834-L851)
[4]: a6b450573b/arch/x86/kernel/cpu/resctrl/core.c (L111-L153)
Signed-off-by: Cory Snider <csnider@mirantis.com>
Remove intelrtd.Manager interface, since we only have a single
implementation, and do not expect another one.
Rename intelRdtManager to Manager, and modify its users accordingly.
Remove NewIntelRdtManager from factory.
Remove IntelRdtfs. Instead, make intelrdt.NewManager return nil if the
feature is not available.
Remove TestFactoryNewIntelRdt as it is now identical to TestFactoryNew.
Add internal function newManager to be used for tests (to make sure
some testing is done even when the feature is not available in
kernel/hardware).
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
For the Nth time I wanted to replace parsing mountinfo with
statfs and the check for superblock magic, but it is not possible
since some code relies of mount options check which can only
be obtained via mountinfo.
Add a note about it.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
In a (quite common) case RDT is not supported by the kernel/hardware,
it does not make sense to parse /proc/cpuinfo and /proc/self/mountinfo,
and yet the current code does it (on every runc exec, for example).
Fortunately, there is a quick way to check whether RDT is available --
if so, kernel creates /sys/fs/resctrl directory. Check its existence,
and skip all the other initialization if it's not present.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This test was written back in the day when findIntelRdtMountpointDir
was using its own mountinfo parser. Commit f1c1fdf911 changed that,
and thus this test is actually testing moby/sys/mountinfo parser, which
does not make much sense.
Remove the test, and drop the io.Reader argument since we no longer need
to parse a custom file.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
In case resctrl filesystem can not be found in /proc/self/mountinfo
(which is pretty common on non-server or non-x86 hardware), subsequent
calls to Root() will result in parsing it again and again.
Use sync.Once to avoid it. Make unit tests call it so that Root() won't
actually parse mountinfo in tests.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Since commit 7296dc1712, type intelRdtData is only used by tests,
and since commit 79d292b9f, its only member is config.
Change the test to use config directly, and remove the type.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Export getIntelRdtRoot function as Root.
This is needed by google/cadvisor, which is (ab)using GetIntelRdtPath,
removed by commit 7296dc1712.
While at it, do some minimal refactoring to always use Root()
internally, not relying on variable value. Other than that it's just
some renaming.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Only some libcontainer packages can be built on non-linux platforms
(not that it make sense, but at least go build succeeds). Let's call
these "good" packages.
For all other packages (i.e. ones that fail to build with GOOS other
than linux), it does not make sense to have linux build tag (as they
are broken already, and thus are not and can not be used on anything
other than Linux).
Remove linux build tag for all non-"good" packages.
This was mostly done by the following script, with just a few manual
fixes on top.
function list_good_pkgs() {
for pkg in $(find . -type d -print); do
GOOS=freebsd go build $pkg 2>/dev/null \
&& GOOS=solaris go build $pkg 2>/dev/null \
&& echo $pkg
done | sed -e 's|^./||' | tr '\n' '|' | sed -e 's/|$//'
}
function remove_tag() {
sed -i -e '\|^// +build linux$|d' $1
go fmt $1
}
SKIP="^("$(list_good_pkgs)")"
for f in $(git ls-files . | grep .go$); do
if echo $f | grep -qE "$SKIP"; then
echo skip $f
continue
fi
echo proc $f
remove_tag $f
done
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Use the term "clos group" instead of "container_id group" as the group
that a container belongs to is not necessarily tied to its container id.
Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
Check that the ClosID directory pre-exists if no L3 or MB schema has
been specified. Conform with the following line from runtime-spec
(config-linux):
If closID is set, and neither of l3CacheSchema and memBwSchema are
set, runtime MUST check if corresponding pre-configured directory
closID is present in mounted resctrl. If such pre-configured directory
closID exists, runtime MUST assign container to this closID and
generate an error if directory does not exist.
Add a TODO note for verifying existing schemata against L3/MB
parameters.
Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
Handle ClosID parameter of IntelRdt. Makes it possible to use
pre-configured classes/ClosIDs and avoid running out of available IDs
which easily happens with per-container classes.
Remove validator checks for empty L3CacheSchema and MemBwSchema fields
in order to be able to leave them empty, and only specify ClosID for
a pre-configured class.
Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
These are not used anywhere outside of the package
(I have also checked the only external user of the package
(github.com/google/cadvisor).
No changes other than changing the case. The following
identifiers are now private:
* IntelRdtTasks
* NewLastCmdError
* NewStats
Brought to you by gorename.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
... the stack, so every caller will automatically benefit from it.
The only change that it causes is the user in
libcontainer/process_linux.go will get a better error message.
[v2: typo fix]
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
For errors that only have a string and an underlying error, using
fmt.Errorf with %w to wrap an error is sufficient.
In this particular case, the code is simplified, and now we have
unwrappable errors as a bonus (same could be achieved by adding
(*LastCmdError).Unwrap() method, but that's adding more code).
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Initially, this was copied over from libcontainer/cgroups, where it made
sense as for cgroup v1 we have multiple controllers and mount points.
Here, we only have a single mount, so there's no need for the whole
type.
Replace all that with a simple error (which is currently internal since
the only user is our own test case).
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
In case getIntelRdtData() returns an error, d is set to nil.
In case the error returned is of NotFoundError type (which happens
if resctlr mount is not found in /proc/self/mountinfo), the function
proceeds to call d.join(), resulting in a nil deref and a panic.
In practice, this never happens in runc because of the checks in
intelrdt() function in libcontainer/configs/validate, which raises
an error in case any of the parameters are set in config but
the IntelRTD itself is not available (that includes checking
that the mount point is there).
Nevertheless, the code is wrong, and can result in nil dereference
if some external users uses Apply on a system without resctrl mount.
Fix this by removing the exclusion.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Do this for all errors except one from unix.*.
This fixes a bunch of errorlint warnings, like these
libcontainer/generic_error.go:25:15: type assertion on error will fail on wrapped errors. Use errors.As to check for specific errors (errorlint)
if le, ok := err.(Error); ok {
^
libcontainer/factory_linux_test.go:145:14: type assertion on error will fail on wrapped errors. Use errors.As to check for specific errors (errorlint)
lerr, ok := err.(Error)
^
libcontainer/state_linux_test.go:28:11: type assertion on error will fail on wrapped errors. Use errors.As to check for specific errors (errorlint)
_, ok := err.(*stateTransitionError)
^
libcontainer/seccomp/patchbpf/enosys_linux.go:88:4: switch on an error will fail on wrapped errors. Use errors.Is to check for specific errors (errorlint)
switch err {
^
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This one is tough as errorlint insists on using errors.Is, and the
latter is known to not work for Go 1.13 which we still support.
So, add a nolint annotation to suppress the warning, and a TODO to
address it later.
For intelrdt, we can do the same, but it is easier to reuse the very
same function from fscommon (note we can't use fscommon for other stuff
as it expects cgroupfs).
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This should result in no change when the error is printed, but make the
errors returned unwrappable, meaning errors.As and errors.Is will work.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
An errror from ioutil.WriteFile already contains file name, so there is
no need to duplicate that information.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This commit adjusts the file mode to use the latest golang style
and also changes the file mode value in accordance with default.
Signed-off-by: Kenta Tada <Kenta.Tada@sony.com>
Use sync.Once to init Intel RDT when needed for a small speedup to
operations which do not require Intel RDT.
Simplify IntelRdtManager initialization in LinuxFactory.
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Intel RDT sub-features can be selectively disabled or enabled by kernel
command line. See "rdt=" option details in kernel document:
https://www.kernel.org/doc/Documentation/admin-guide/kernel-parameters.txt
But Cache Monitoring Technology (CMT) feature is not correctly checked
in init() and getCMTNumaNodeStats() now. If CMT is disabled by kernel
command line (e.g., rdt=!cmt,mbmtotal,mbmlocal,l3cat,mba) while hardware
supports CMT, we may get following error when getting Intel RDT stats:
runc run c1
runc events c1
ERRO[0005] container_linux.go:200: getting container's Intel RDT stats
caused: open /sys/fs/resctrl/c1/mon_data/mon_L3_00/llc_occupancy: no
such file or directory
Fix CMT feature check in init() and GetStats() call paths.
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
It might be a tad slower but it surely more correct and well maintained,
so it's better to use it than rely on a custom implementation which is
kind of hard to get entirely right.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
The line we are parsing looks like this
> flags : fpu vme de pse <...>
so look for "flags" as a prefix, not substring.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
The Err() method should be called after the Scan() loop, not inside it.
Found by
git grep -A3 -F '.Scan()' | grep Err
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This patch fixes a corner case when destroy a container:
If we start a container without 'intelRdt' config set, and then we run
“runc update --l3-cache-schema/--mem-bw-schema” to add 'intelRdt' config
implicitly.
Now if we enter "exit" from the container inside, we will pass through
linuxContainer.Destroy() -> state.destroy() -> intelRdtManager.Destroy().
But in IntelRdtManager.Destroy(), IntelRdtManager.Path is still null
string, it hasn’t been initialized yet. As a result, the created rdt
group directory during "runc update" will not be removed as expected.
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
MBA Software Controller feature is introduced in Linux kernel v4.18.
It is a software enhancement to mitigate some limitations in MBA which
describes in kernel documentation. It also makes the interface more user
friendly - we could specify memory bandwidth in "MBps" (Mega Bytes per
second) as well as in "percentages".
The kernel underneath would use a software feedback mechanism or a
"Software Controller" which reads the actual bandwidth using MBM
counters and adjust the memory bandwidth percentages to ensure:
"actual memory bandwidth < user specified memory bandwidth".
We could enable this feature through mount option "-o mba_MBps":
mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl
In runc, we handle both memory bandwidth schemata in unified format:
"MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."
The unit of memory bandwidth is specified in "percentages" by default,
and in "MBps" if MBA Software Controller is enabled.
For more information about Intel RDT and MBA Software Controller:
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Linux kernel v4.15 introduces better diagnostics for Intel RDT operation
errors. If any error returns when making new directories or writing to
any of the control file in resctrl filesystem, reading file
/sys/fs/resctrl/info/last_cmd_status could provide more information that
can be conveyed in the error returns from file operations.
Some examples:
echo "L3:0=f3;1=ff" > /sys/fs/resctrl/container_id/schemata
-bash: echo: write error: Invalid argument
cat /sys/fs/resctrl/info/last_cmd_status
mask f3 has non-consecutive 1-bits
echo "MB:0=0;1=110" > /sys/fs/resctrl/container_id/schemata
-bash: echo: write error: Invalid argument
cat /sys/fs/resctrl/info/last_cmd_status
MB value 0 out of range [10,100]
cd /sys/fs/resctrl
mkdir 1 2 3 4 5 6 7 8
mkdir: cannot create directory '8': No space left on device
cat /sys/fs/resctrl/info/last_cmd_status
out of CLOSIDs
See 'last_cmd_status' for more details in kernel documentation:
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt
In runc, we could append the diagnostics information to the error
message of Intel RDT operation errors to provide more user-friendly
information.
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>