Commit Graph

46 Commits

Author SHA1 Message Date
Kir Kolyshkin
a80e1217d2 libct/intelrdt: add Root()
Export getIntelRdtRoot function as Root.

This is needed by google/cadvisor, which is (ab)using GetIntelRdtPath,
removed by commit 7296dc1712.

While at it, do some minimal refactoring to always use Root()
internally, not relying on variable value. Other than that it's just
some renaming.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-10-07 20:23:21 -07:00
Kir Kolyshkin
dbb9fc03ae libct/*: remove linux build tag from some pkgs
Only some libcontainer packages can be built on non-linux platforms
(not that it make sense, but at least go build succeeds). Let's call
these "good" packages.

For all other packages (i.e. ones that fail to build with GOOS other
than linux), it does not make sense to have linux build tag (as they
are broken already, and thus are not and can not be used on anything
other than Linux).

Remove linux build tag for all non-"good" packages.

This was mostly done by the following script, with just a few manual
fixes on top.

function list_good_pkgs() {
	for pkg in $(find . -type d -print); do
		GOOS=freebsd go build $pkg 2>/dev/null \
		&& GOOS=solaris go build $pkg 2>/dev/null \
		&& echo $pkg
	done | sed -e 's|^./||' | tr '\n' '|' | sed -e 's/|$//'
}

function remove_tag() {
	sed -i -e '\|^// +build linux$|d' $1
	go fmt $1
}

SKIP="^("$(list_good_pkgs)")"
for f in $(git ls-files . | grep .go$); do
	if echo $f | grep -qE "$SKIP"; then
		echo skip $f
		continue
	fi
	echo proc $f
	remove_tag $f
done

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-08-30 20:52:07 -07:00
Markus Lehtonen
9393700003 libcontainer/intelrdt: update code comments
Use the term "clos group" instead of "container_id group" as the group
that a container belongs to is not necessarily tied to its container id.

Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
2021-08-20 07:47:07 +03:00
Markus Lehtonen
79d292b9ff libcontainer/intelrdt: verify ClosID existence
Check that the ClosID directory pre-exists if no L3 or MB schema has
been specified. Conform with the following line from runtime-spec
(config-linux):

  If closID is set, and neither of l3CacheSchema and memBwSchema are
  set, runtime MUST check if corresponding pre-configured directory
  closID is present in mounted resctrl. If such pre-configured directory
  closID exists, runtime MUST assign container to this closID and
  generate an error if directory does not exist.

Add a TODO note for verifying existing schemata against L3/MB
parameters.

Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
2021-08-09 16:18:59 +03:00
Markus Lehtonen
17e3b41dd0 libcontainer/intelrdt: support ClosID parameter
Handle ClosID parameter of IntelRdt. Makes it possible to use
pre-configured classes/ClosIDs and avoid running out of available IDs
which easily happens with per-container classes.

Remove validator checks for empty L3CacheSchema and MemBwSchema fields
in order to be able to leave them empty, and only specify ClosID for
a pre-configured class.

Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
2021-08-09 15:58:03 +03:00
Markus Lehtonen
7296dc1712 libcontainer/intelrdt: refactor clos path handling
Simplify the code and make path a property of the container (via
intelRdtManager).

Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
2021-08-09 15:58:03 +03:00
Kir Kolyshkin
0229a77a80 libcontainer/intelrdt: privatize some ids
These are not used anywhere outside of the package
(I have also checked the only external user of the package
(github.com/google/cadvisor).

No changes other than changing the case. The following
identifiers are now private:

 * IntelRdtTasks
 * NewLastCmdError
 * NewStats

Brought to you by gorename.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-06-28 12:45:28 -07:00
Kir Kolyshkin
8f8dfc498a libcontainer/intelrdt: move NewLastCmdError down
... the stack, so every caller will automatically benefit from it.

The only change that it causes is the user in
libcontainer/process_linux.go will get a better error message.

[v2: typo fix]

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-06-28 12:45:28 -07:00
Kir Kolyshkin
00d1562967 libct/intelrdt: simplify NewLastCmdError
For errors that only have a string and an underlying error, using
fmt.Errorf with %w to wrap an error is sufficient.

In this particular case, the code is simplified, and now we have
unwrappable errors as a bonus (same could be achieved by adding
(*LastCmdError).Unwrap() method, but that's adding more code).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-06-28 12:45:28 -07:00
Kir Kolyshkin
e0ce428bce libct/intelrdt: remove NotFoundError type
Initially, this was copied over from libcontainer/cgroups, where it made
sense as for cgroup v1 we have multiple controllers and mount points.

Here, we only have a single mount, so there's no need for the whole
type.

Replace all that with a simple error (which is currently internal since
the only user is our own test case).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-06-28 12:45:28 -07:00
Kir Kolyshkin
feff2c451e libct/intelrdt: fix potential nil dereference
In case getIntelRdtData() returns an error, d is set to nil.

In case the error returned is of NotFoundError type (which happens
if resctlr mount is not found in /proc/self/mountinfo), the function
proceeds to call d.join(), resulting in a nil deref and a panic.

In practice, this never happens in runc because of the checks in
intelrdt() function in libcontainer/configs/validate, which raises
an error in case any of the parameters are set in config but
the IntelRTD itself is not available (that includes checking
that the mount point is there).

Nevertheless, the code is wrong, and can result in nil dereference
if some external users uses Apply on a system without resctrl mount.

Fix this by removing the exclusion.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-06-28 12:45:28 -07:00
Kir Kolyshkin
f6a0899b7f *: use errors.As and errors.Is
Do this for all errors except one from unix.*.

This fixes a bunch of errorlint warnings, like these

libcontainer/generic_error.go:25:15: type assertion on error will fail on wrapped errors. Use errors.As to check for specific errors (errorlint)
	if le, ok := err.(Error); ok {
	             ^
libcontainer/factory_linux_test.go:145:14: type assertion on error will fail on wrapped errors. Use errors.As to check for specific errors (errorlint)
	lerr, ok := err.(Error)
	            ^
libcontainer/state_linux_test.go:28:11: type assertion on error will fail on wrapped errors. Use errors.As to check for specific errors (errorlint)
	_, ok := err.(*stateTransitionError)
	         ^
libcontainer/seccomp/patchbpf/enosys_linux.go:88:4: switch on an error will fail on wrapped errors. Use errors.Is to check for specific errors (errorlint)
			switch err {
			^

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-06-22 16:09:47 -07:00
Kir Kolyshkin
adbac31d88 libct: fix errorlint warning about strconv.NumError
This one is tough as errorlint insists on using errors.Is, and the
latter is known to not work for Go 1.13 which we still support.

So, add a nolint annotation to suppress the warning, and a TODO to
address it later.

For intelrdt, we can do the same, but it is easier to reuse the very
same function from fscommon (note we can't use fscommon for other stuff
as it expects cgroupfs).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-06-22 16:09:47 -07:00
Kir Kolyshkin
7be93a66b9 *: fmt.Errorf: use %w when appropriate
This should result in no change when the error is printed, but make the
errors returned unwrappable, meaning errors.As and errors.Is will work.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-06-22 16:09:47 -07:00
Kir Kolyshkin
92e8d9b91a libct/intelrdt: error message nits
An errror from ioutil.WriteFile already contains file name, so there is
no need to duplicate that information.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-06-22 11:41:41 -07:00
Kenta Tada
25987d03e3 libcontainer/intelrdt: adjust the file mode
This commit adjusts the file mode to use the latest golang style
and also changes the file mode value in accordance with default.

Signed-off-by: Kenta Tada <Kenta.Tada@sony.com>
2021-01-08 15:22:24 +09:00
Xiaochen Shen
325a74ddec libcontainer/intelrdt: rm init() from intelrdt.go
Use sync.Once to init Intel RDT when needed for a small speedup to
operations which do not require Intel RDT.

Simplify IntelRdtManager initialization in LinuxFactory.

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
2020-12-16 23:37:31 +08:00
Akihiro Suda
689513cc09 Merge pull request #2643 from xiaochenshen/rdt-cmt-check
libcontainer/intelrdt: fix CMT feature check
2020-11-18 01:56:22 +09:00
Xiaochen Shen
f62ad4a0de libcontainer/intelrdt: rename CAT and MBA enabled flags
Rename CAT and MBA enabled flags to be consistent with others.
No functional change.

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
2020-11-10 15:32:01 +08:00
Xiaochen Shen
620f4c5c88 libcontainer/intelrdt: fix CMT feature check
Intel RDT sub-features can be selectively disabled or enabled by kernel
command line. See "rdt=" option details in kernel document:
https://www.kernel.org/doc/Documentation/admin-guide/kernel-parameters.txt

But Cache Monitoring Technology (CMT) feature is not correctly checked
in init() and getCMTNumaNodeStats() now. If CMT is disabled by kernel
command line (e.g., rdt=!cmt,mbmtotal,mbmlocal,l3cat,mba) while hardware
supports CMT, we may get following error when getting Intel RDT stats:
  runc run c1
  runc events c1
  ERRO[0005] container_linux.go:200: getting container's Intel RDT stats
  caused: open /sys/fs/resctrl/c1/mon_data/mon_L3_00/llc_occupancy: no
  such file or directory

Fix CMT feature check in init() and GetStats() call paths.

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
2020-11-10 10:36:20 +08:00
Xiaochen Shen
933c4d31a7 libcontainer/intelrdt: privatize IntelRdtManager and its fields
No functional change.

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
2020-10-17 16:59:42 +08:00
Xiaochen Shen
2c004a101e libcontainer/intelrdt: introduce NewManager()
Introduce NewManager() to wrap up IntelRdtManager initialization. And
call it when required.

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
2020-10-17 16:59:20 +08:00
Akihiro Suda
bb539a9965 Merge pull request #2628 from thaJeztah/linting_foo
fix some linting issues
2020-10-06 20:10:40 +09:00
Mrunal Patel
0d9b0dfc46 Merge pull request #2626 from kolyshkin/mountinfo-0.3.1
vendor: bump mountinfo v0.3.1
2020-10-02 15:11:45 -07:00
Sebastiaan van Stijn
e8eb8000f1 fix some linting issues
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-10-02 10:21:54 +02:00
Kir Kolyshkin
87412ee435 vendor: bump mountinfo v0.3.1
It contains some breaking changes, so fix the code.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-10-01 18:51:25 -07:00
Sebastiaan van Stijn
28b452bf65 libcontainer: unconvert
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-10-01 18:36:56 +02:00
Kir Kolyshkin
2c70d23840 libct/intelrdt: add TestFindIntelRdtMountpointDir
Heavily based on work by Paweł Szulik <pawel.szulik@intel.com>

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-09-29 09:21:32 -07:00
Kir Kolyshkin
f1c1fdf911 libcontainer/intelrdt: use moby/sys/mountinfo
It might be a tad slower but it surely more correct and well maintained,
so it's better to use it than rely on a custom implementation which is
kind of hard to get entirely right.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-09-28 20:18:01 -07:00
Paweł Szulik
799d94818d intelrdt: Add Cache Monitoring Technology stats
Signed-off-by: Paweł Szulik <pawel.szulik@intel.com>
2020-04-25 09:43:48 +02:00
Paweł Szulik
d1e4c7b803 intelrdt: add mbm stats
Signed-off-by: Paweł Szulik <pawel.szulik@intel.com>
2020-04-15 13:53:56 +02:00
Paweł Szulik
7fa13b2773 intelrdt: change parseCpuInfoFile to return struct
Signed-off-by: Paweł Szulik <pawel.szulik@intel.com>
2020-04-08 23:03:36 +02:00
Kir Kolyshkin
aab2c8ba52 libcontainer/intelrdt: optimize parseCpuInfoFile
The line we are parsing looks like this

> flags		: fpu vme de pse <...>

so look for "flags" as a prefix, not substring.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-03-27 00:41:11 -07:00
Kir Kolyshkin
0af5cd2041 Nit: fix use of bufio.Scanner.Err
The Err() method should be called after the Scan() loop, not inside it.

Found by

 git grep -A3 -F '.Scan()' | grep Err

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-03-27 00:12:17 -07:00
Kir Kolyshkin
a572216f74 libcontainer/intelrdt: rm fmt.Sprintf
It it not needed as it does nothing here.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-03-20 12:33:24 -07:00
Xiaochen Shen
acb75d0e38 libcontainer: intelrdt: fix null intelrdt path issue in Destroy()
This patch fixes a corner case when destroy a container:

If we start a container without 'intelRdt' config set, and then we run
“runc update --l3-cache-schema/--mem-bw-schema” to add 'intelRdt' config
implicitly.

Now if we enter "exit" from the container inside, we will pass through
linuxContainer.Destroy() -> state.destroy() -> intelRdtManager.Destroy().
But in IntelRdtManager.Destroy(), IntelRdtManager.Path is still null
string, it hasn’t been initialized yet. As a result, the created rdt
group directory during "runc update" will not be removed as expected.

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
2019-01-05 00:34:25 +08:00
JoeWrightss
769d6c4a75 Fix some typos
Signed-off-by: JoeWrightss <zhoulin.xie@daocloud.io>
2018-12-09 23:52:54 +08:00
Xiaochen Shen
95af9eff82 libcontainer: intelrdt: add support for Intel RDT/MBA Software Controller in runc
MBA Software Controller feature is introduced in Linux kernel v4.18.
It is a software enhancement to mitigate some limitations in MBA which
describes in kernel documentation. It also makes the interface more user
friendly - we could specify memory bandwidth in "MBps" (Mega Bytes per
second) as well as in "percentages".

The kernel underneath would use a software feedback mechanism or a
"Software Controller" which reads the actual bandwidth using MBM
counters and adjust the memory bandwidth percentages to ensure:
"actual memory bandwidth < user specified memory bandwidth".

We could enable this feature through mount option "-o mba_MBps":
mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl

In runc, we handle both memory bandwidth schemata in unified format:
"MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."
The unit of memory bandwidth is specified in "percentages" by default,
and in "MBps" if MBA Software Controller is enabled.

For more information about Intel RDT and MBA Software Controller:
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
2018-11-13 23:27:08 +08:00
Xiaochen Shen
6c307f8ff2 libcontainer: intelrdt: add user-friendly diagnostics for Intel RDT operation errors
Linux kernel v4.15 introduces better diagnostics for Intel RDT operation
errors. If any error returns when making new directories or writing to
any of the control file in resctrl filesystem, reading file
/sys/fs/resctrl/info/last_cmd_status could provide more information that
can be conveyed in the error returns from file operations.

Some examples:
  echo "L3:0=f3;1=ff" > /sys/fs/resctrl/container_id/schemata
  -bash: echo: write error: Invalid argument
  cat /sys/fs/resctrl/info/last_cmd_status
  mask f3 has non-consecutive 1-bits

  echo "MB:0=0;1=110" > /sys/fs/resctrl/container_id/schemata
  -bash: echo: write error: Invalid argument
  cat /sys/fs/resctrl/info/last_cmd_status
  MB value 0 out of range [10,100]

  cd /sys/fs/resctrl
  mkdir 1 2 3 4 5 6 7 8
  mkdir: cannot create directory '8': No space left on device
  cat /sys/fs/resctrl/info/last_cmd_status
  out of CLOSIDs

See 'last_cmd_status' for more details in kernel documentation:
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt

In runc, we could append the diagnostics information to the error
message of Intel RDT operation errors to provide more user-friendly
information.

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
2018-10-19 00:16:08 +08:00
Xiaochen Shen
d59b17d6d5 libcontainer: intelrdt: Add more check if sub-features are enabled
Double check if Intel RDT sub-features are available in "resource
control" filesystem. Intel RDT sub-features can be selectively disabled
or enabled by kernel command line (e.g., rdt=!l3cat,mba) in 4.14 and
newer kernel.

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
2018-10-16 14:29:44 +08:00
Xiaochen Shen
27560ace2f libcontainer: intelrdt: add support for Intel RDT/MBA in runc
Memory Bandwidth Allocation (MBA) is a resource allocation sub-feature
of Intel Resource Director Technology (RDT) which is supported on some
Intel Xeon platforms. Intel RDT/MBA provides indirect and approximate
throttle over memory bandwidth for the software. A user controls the
resource by indicating the percentage of maximum memory bandwidth.

Hardware details of Intel RDT/MBA can be found in section 17.18 of
Intel Software Developer Manual:
https://software.intel.com/en-us/articles/intel-sdm

In Linux 4.12 kernel and newer, Intel RDT/MBA is enabled by kernel
config CONFIG_INTEL_RDT. If hardware support, CPU flags `rdt_a` and
`mba` will be set in /proc/cpuinfo.

Intel RDT "resource control" filesystem hierarchy:
mount -t resctrl resctrl /sys/fs/resctrl
tree /sys/fs/resctrl
/sys/fs/resctrl/
|-- info
|   |-- L3
|   |   |-- cbm_mask
|   |   |-- min_cbm_bits
|   |   |-- num_closids
|   |-- MB
|       |-- bandwidth_gran
|       |-- delay_linear
|       |-- min_bandwidth
|       |-- num_closids
|-- ...
|-- schemata
|-- tasks
|-- <container_id>
    |-- ...
    |-- schemata
    |-- tasks

For MBA support for `runc`, we will reuse the infrastructure and code
base of Intel RDT/CAT which implemented in #1279. We could also make
use of `tasks` and `schemata` configuration for memory bandwidth
resource constraints.

The file `tasks` has a list of tasks that belongs to this group (e.g.,
<container_id>" group). Tasks can be added to a group by writing the
task ID to the "tasks" file (which will automatically remove them from
the previous group to which they belonged). New tasks created by
fork(2) and clone(2) are added to the same group as their parent.

The file `schemata` has a list of all the resources available to this
group. Each resource (L3 cache, memory bandwidth) has its own line and
format.

Memory bandwidth schema:
It has allocation values for memory bandwidth on each socket, which
contains L3 cache id and memory bandwidth percentage.
    Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."

The minimum bandwidth percentage value for each CPU model is predefined
and can be looked up through "info/MB/min_bandwidth". The bandwidth
granularity that is allocated is also dependent on the CPU model and
can be looked up at "info/MB/bandwidth_gran". The available bandwidth
control steps are: min_bw + N * bw_gran. Intermediate values are
rounded to the next control step available on the hardware.

For more information about Intel RDT kernel interface:
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt

An example for runc:
Consider a two-socket machine with two L3 caches where the minimum
memory bandwidth of 10% with a memory bandwidth granularity of 10%.
Tasks inside the container may use a maximum memory bandwidth of 20%
on socket 0 and 70% on socket 1.

"linux": {
    "intelRdt": {
        "memBwSchema": "MB:0=20;1=70"
    }
}

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
2018-10-16 14:29:29 +08:00
Yan Zhu
feb90346e0 doc: fix typo
Signed-off-by: Yan Zhu <yanzhu@alauda.io>
2018-09-07 11:58:59 +08:00
Xiaochen Shen
d89217515b libcontainer: intelrdt: fix a GetStats() issue
This fixes a GetStats() issue introduced in #1590:
If Intel RDT is enabled by hardware and kernel, but intelRdt is not
specified in original config, GetStats() will return error unexpectedly
because we haven't called Apply() to create intelrdt group or attach
tasks for this container. As a result, runc events command will have no
output.

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
2017-10-17 17:37:07 +08:00
Xiaochen Shen
2549545df5 intelrdt: always init IntelRdtManager if Intel RDT is enabled
In current implementation:
Either Intel RDT is not enabled by hardware and kernel, or intelRdt is
not specified in original config, we don't init IntelRdtManager in the
container to handle intelrdt constraint. It is a tradeoff that Intel RDT
has hardware limitation to support only limited number of groups.

This patch makes a minor change to support update command:
Whether or not intelRdt is specified in config, we always init
IntelRdtManager in the container if Intel RDT is enabled. If intelRdt is
not specified in original config, we just don't Apply() to create
intelrdt group or attach tasks for this container.

In update command, we could re-enable through IntelRdtManager.Apply()
and then update intelrdt constraint.

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
2017-09-20 01:37:31 +08:00
Xiaochen Shen
88d22fde40 libcontainer: intelrdt: use init() to avoid race condition
This is the follow-up PR of #1279 to fix remaining issues:

Use init() to avoid race condition in IsIntelRdtEnabled().
Add also rename some variables and functions.

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
2017-09-08 17:15:31 +08:00
Xiaochen Shen
692f6e1e27 libcontainer: add support for Intel RDT/CAT in runc
About Intel RDT/CAT feature:
Intel platforms with new Xeon CPU support Intel Resource Director Technology
(RDT). Cache Allocation Technology (CAT) is a sub-feature of RDT, which
currently supports L3 cache resource allocation.

This feature provides a way for the software to restrict cache allocation to a
defined 'subset' of L3 cache which may be overlapping with other 'subsets'.
The different subsets are identified by class of service (CLOS) and each CLOS
has a capacity bitmask (CBM).

For more information about Intel RDT/CAT can be found in the section 17.17
of Intel Software Developer Manual.

About Intel RDT/CAT kernel interface:
In Linux 4.10 kernel or newer, the interface is defined and exposed via
"resource control" filesystem, which is a "cgroup-like" interface.

Comparing with cgroups, it has similar process management lifecycle and
interfaces in a container. But unlike cgroups' hierarchy, it has single level
filesystem layout.

Intel RDT "resource control" filesystem hierarchy:
mount -t resctrl resctrl /sys/fs/resctrl
tree /sys/fs/resctrl
/sys/fs/resctrl/
|-- info
|   |-- L3
|       |-- cbm_mask
|       |-- min_cbm_bits
|       |-- num_closids
|-- cpus
|-- schemata
|-- tasks
|-- <container_id>
    |-- cpus
    |-- schemata
    |-- tasks

For runc, we can make use of `tasks` and `schemata` configuration for L3 cache
resource constraints.

The file `tasks` has a list of tasks that belongs to this group (e.g.,
<container_id>" group). Tasks can be added to a group by writing the task ID
to the "tasks" file  (which will automatically remove them from the previous
group to which they belonged). New tasks created by fork(2) and clone(2) are
added to the same group as their parent. If a pid is not in any sub group, it
Is in root group.

The file `schemata` has allocation bitmasks/values for L3 cache on each socket,
which contains L3 cache id and capacity bitmask (CBM).
	Format: "L3:<cache_id0>=<cbm0>;<cache_id1>=<cbm1>;..."
For example, on a two-socket machine, L3's schema line could be `L3:0=ff;1=c0`
which means L3 cache id 0's CBM is 0xff, and L3 cache id 1's CBM is 0xc0.

The valid L3 cache CBM is a *contiguous bits set* and number of bits that can
be set is less than the max bit. The max bits in the CBM is varied among
supported Intel Xeon platforms. In Intel RDT "resource control" filesystem
layout, the CBM in a group should be a subset of the CBM in root. Kernel will
check if it is valid when writing. e.g., 0xfffff in root indicates the max bits
of CBM is 20 bits, which mapping to entire L3 cache capacity. Some valid CBM
values to set in a group: 0xf, 0xf0, 0x3ff, 0x1f00 and etc.

For more information about Intel RDT/CAT kernel interface:
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt

An example for runc:
Consider a two-socket machine with two L3 caches where the default CBM is
0xfffff and the max CBM length is 20 bits. With this configuration, tasks
inside the container only have access to the "upper" 80% of L3 cache id 0 and
the "lower" 50% L3 cache id 1:

"linux": {
	"intelRdt": {
		"l3CacheSchema": "L3:0=ffff0;1=3ff"
	}
}

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
2017-09-01 14:26:33 +08:00