Commit Graph

10 Commits

Author SHA1 Message Date
Aleksa Sarai
8da42aaec2 sync: split init config (stream) and synchronisation (seqpacket) pipes
We have different requirements for the initial configuration and
initWaiter pipe (just send netlink and JSON blobs with no complicated
handling needed for message coalescing) and the packet-based
synchronisation pipe.

Tests with switching everything to SOCK_SEQPACKET lead to endless issues
with runc hanging on start-up because random things would try to do
short reads (which SOCK_SEQPACKET will not allow and the Go stdlib
explicitly treats as a streaming source), so splitting it was the only
reasonable solution. Even doing somewhat dodgy tricks such as adding a
Read() wrapper which actually calls ReadPacket() and makes it seem like
a stream source doesn't work -- and is a bit too magical.

One upside is that doing it this way makes the difference between the
modes clearer -- INITPIPE is still used for initWaiter syncrhonisation
but aside from that all other synchronisation is done by SYNCPIPE.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2023-09-24 20:31:14 +08:00
Aleksa Sarai
f81ef1493d libcontainer: sync: cleanup synchronisation code
This includes quite a few cleanups and improvements to the way we do
synchronisation. The core behaviour is unchanged, but switching to
embedding json.RawMessage into the synchronisation structure will allow
us to do more complicated synchronisation operations in future patches.

The file descriptor passing through the synchronisation system feature
will be used as part of the idmapped-mount and bind-mount-source
features when switching that code to use the new mount API outside of
nsexec.c.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2023-08-15 19:54:24 -07:00
Aleksa Sarai
70f4e46e68 utils: use close_range(2) to close leftover file descriptors
close_range(2) is far more efficient than a readdir of /proc/self/fd and
then doing a syscall for each file descriptor.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2023-08-01 15:37:58 +10:00
Kir Kolyshkin
5516294172 Remove io/ioutil use
See https://golang.org/doc/go1.16#ioutil

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-10-14 13:46:02 -07:00
Kir Kolyshkin
d8da00355e *: add go-1.17+ go:build tags
Go 1.17 introduce this new (and better) way to specify build tags.
For more info, see https://golang.org/design/draft-gobuild.

As a way to seamlessly switch from old to new build tags, gofmt (and
gopls) from go 1.17 adds the new tags along with the old ones.

Later, when go < 1.17 is no longer supported, the old build tags
can be removed.

Now, as I started to use latest gopls (v0.7.1), it adds these tags
while I edit. Rather than to randomly add new build tags, I guess
it is better to do it once for all files.

Mind that previous commits removed some tags that were useless,
so this one only touches packages that can at least be built
on non-linux.

Brought to you by

        go1.17 fmt ./...

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-08-30 20:58:22 -07:00
Kir Kolyshkin
7be93a66b9 *: fmt.Errorf: use %w when appropriate
This should result in no change when the error is printed, but make the
errors returned unwrappable, meaning errors.As and errors.Is will work.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-06-22 16:09:47 -07:00
Aleksa Sarai
d463f6485b *: verify that operations on /proc/... are on procfs
This is an additional mitigation for CVE-2019-16884. The primary problem
is that Docker can be coerced into bind-mounting a file system on top of
/proc (resulting in label-related writes to /proc no longer happening).

While we are working on mitigations against permitting the mounts, this
helps avoid our code from being tricked into writing to non-procfs
files. This is not a perfect solution (after all, there might be a
bind-mount of a different procfs file over the target) but in order to
exploit that you would need to be able to tweak a config.json pretty
specifically (which thankfully Docker doesn't allow).

Specifically this stops AppArmor from not labeling a process silently
due to /proc/self/attr/... being incorrectly set, and stops any
accidental fd leaks because /proc/self/fd/... is not real.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2019-09-30 09:06:48 +10:00
Christy Perez
3d7cb4293c Move libcontainer to x/sys/unix
Since syscall is outdated and broken for some architectures,
use x/sys/unix instead.

There are still some dependencies on the syscall package that will
remain in syscall for the forseeable future:

Errno
Signal
SysProcAttr

Additionally:
- os still uses syscall, so it needs to be kept for anything
returning *os.ProcessState, such as process.Wait.

Signed-off-by: Christy Perez <christy@linux.vnet.ibm.com>
2017-05-22 17:35:20 -05:00
Michael Crosby
00a0ecf554 Add separate console socket
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2017-03-16 10:23:59 -07:00
John Howard
9f80f3f181 Windows: Factor out CloseExecFrom
Signed-off-by: John Howard <jhoward@microsoft.com>
2015-06-26 20:13:17 -07:00