Commit Graph

197 Commits

Author SHA1 Message Date
Mrunal Patel
9cda583235 Merge pull request #1832 from giuseppe/runc-drop-invalid-proc-destination-with-chroot
linux: drop check for /proc as invalid dest
2018-09-04 09:26:21 -07:00
ChangFeng
3ce8fac7c4 libcontainer: add /proc/loadavg to the white list of bind mount
Signed-off-by: JunLi <lijun.git@gmail.com>
2018-08-30 21:30:23 +08:00
Giuseppe Scrivano
636b664027 linux: drop check for /proc as invalid dest
it is now allowed to bind mount /proc.  This is useful for rootless
containers when the PID namespace is shared with the host.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2018-08-30 09:56:18 +02:00
Daniel J Walsh
62a4763a7a When doing a copyup, /tmp can not be a shared mount point
MOVE_MOUNT will fail under certain situations.

You are not allowed to MS_MOVE if the parent directory is shared.

man mount
...
   The move operation
       Move a mounted tree to another place (atomically).  The call is:

              mount --move olddir newdir

       This  will cause the contents which previously appeared under olddir to
       now be accessible under newdir.  The physical location of the files  is
       not changed.  Note that olddir has to be a mountpoint.

       Note  also that moving a mount residing under a shared mount is invalid
       and unsupported.  Use findmnt -o TARGET,PROPAGATION to see the  current
       propagation flags.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2018-08-20 17:41:06 -04:00
Mrunal Patel
26ec8a9783 Revert "libcontainer/rootfs_linux: minor cleanup"
This reverts commit 1b27db67f1.

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2018-08-14 15:50:18 -07:00
Bin Chen
1b27db67f1 libcontainer/rootfs_linux: minor cleanup
move variable close to where is used

Signed-off-by: Bin Chen <nk@devicu.com>
2018-04-16 22:25:48 +10:00
Daniel J Walsh
43aea05946 Label the masked tmpfs with the mount label
Currently if a confined container process tries to list these directories
AVC's are generated because they are labeled with external labels.  Adding
the mountlabel will remove these AVC's.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2018-03-09 14:29:06 -05:00
Michael Crosby
91ca331474 chroot when no mount namespaces is provided
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2018-01-25 11:36:37 -05:00
Vincent Demeester
3ca4c78b1a Import docker/docker/pkg/mount into runc
This will help get rid of docker/docker dependency in runc 👼

Signed-off-by: Vincent Demeester <vincent@sbr.pm>
2017-11-08 16:25:58 +01:00
Vincent Demeester
594501475e Use cyphar/filepath-securejoin instead of docker pkg/symlink
runc shouldn't depend on docker and be more self-contained.
Removing github.com/pkg/symlink dep is the first step to not depend on docker anymore

Signed-off-by: Vincent Demeester <vincent@sbr.pm>
2017-10-31 16:53:45 +01:00
Aleksa Sarai
2430a98e64 merge branch 'pr-1500'
rootfs: switch ms_private remount of oldroot to ms_slave

LGTMs: @crosbymichael @hqhq
Closes opencontainers/runc#1500
2017-10-14 09:32:59 +11:00
Akihiro Suda
2edd36fdff libcontainer: create Cwd when it does not exist
The benefit for doing this within runc is that it works well with
userns.
Actually, runc already does the same thing for mount points.

Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
2017-10-05 05:31:46 +00:00
Tycho Andersen
66eb2a3e8f fix --read-only containers under --userns-remap
The documentation here:
https://docs.docker.com/engine/security/userns-remap/#user-namespace-known-limitations

says that readonly containers can't be used with user namespaces do to some
kernel restriction. In fact, there is a special case in the kernel to be
able to do stuff like this, so let's use it.

This takes us from:

ubuntu@docker:~$ docker run -it --read-only ubuntu
docker: Error response from daemon: oci runtime error: container_linux.go:262: starting container process caused "process_linux.go:339: container init caused \"rootfs_linux.go:125: remounting \\\"/dev\\\" as readonly caused \\\"operation not permitted\\\"\"".

to:

ubuntu@docker:~$ docker-runc --version
runc version 1.0.0-rc4+dev
commit: ae2948042b08ad3d6d13cd09f40a50ffff4fc688-dirty
spec: 1.0.0
ubuntu@docker:~$ docker run -it --read-only ubuntu
root@181e2acb909a:/# touch foo
touch: cannot touch 'foo': Read-only file system

Signed-off-by: Tycho Andersen <tycho@docker.com>
2017-08-24 16:43:21 -06:00
Aleksa Sarai
117c92745b rootfs: switch ms_private remount of oldroot to ms_slave
Using MS_PRIVATE meant that there was a race between the mount(2) and
the umount2(2) calls where runc inadvertently has a live reference to a
mountpoint that existed on the host (which the host cannot kill
implicitly through an unmount and peer sharing).

In particular, this means that if we have a devicemapper mountpoint and
the host is trying to delete the underlying device, the delete will fail
because it is "in use" during the race. While the race is _very_ small
(and libdm actually retries to avoid these sorts of cases) this appears
to manifest in various cases.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-06-29 01:20:23 +10:00
Christy Perez
3d7cb4293c Move libcontainer to x/sys/unix
Since syscall is outdated and broken for some architectures,
use x/sys/unix instead.

There are still some dependencies on the syscall package that will
remain in syscall for the forseeable future:

Errno
Signal
SysProcAttr

Additionally:
- os still uses syscall, so it needs to be kept for anything
returning *os.ProcessState, such as process.Wait.

Signed-off-by: Christy Perez <christy@linux.vnet.ibm.com>
2017-05-22 17:35:20 -05:00
Qiang Huang
96e0df7633 Fix comments about when to pivot_root
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2017-05-06 07:59:03 +08:00
Daniel, Dao Quang Minh
13a8c5d140 Merge pull request #1365 from hqhq/use_go_selinux
Use opencontainers/selinux package
2017-04-15 14:22:32 +01:00
Aleksa Sarai
baeef29858 rootless: add rootless cgroup manager
The rootless cgroup manager acts as a noop for all set and apply
operations. It is just used for rootless setups. Currently this is far
too simple (we need to add opportunistic cgroup management), but is good
enough as a first-pass at a noop cgroup manager.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-03-23 20:46:20 +11:00
Qiang Huang
5e7b48f7c0 Use opencontainers/selinux package
It's splitted as a separate project.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2017-03-23 08:21:19 +08:00
Ma Shimiao
06e27471bb support create device with type p and u
Signed-off-by: Ma Shimiao <mashimiao.fnst@cn.fujitsu.com>
2017-02-10 14:45:15 +08:00
Qiang Huang
45a8341811 Small cleanup
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2017-02-08 15:09:06 +08:00
Antonio Murdaca
ca14e7b463 libcontainer: rootfs_linux: support overlayfs
As the runtime-spec allows it, we want to be able to specify overlayfs
mounts with:

    {
        "destination": "/etc/pki",
        "type": "overlay",
        "source": "overlay",
        "options": [
            "lowerdir=/etc/pki:/home/amurdaca/go/src/github.com/opencontainers/runc/rootfs_fedora/etc/pki"
        ]
    },

This patch takes care of allowing overlayfs mounts. Both RO and RW
should be supported.

Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2017-02-06 19:43:24 +01:00
Qiang Huang
0599ac7d93 Do not create cgroup dir name from combining subsystems
On some systems, when we mount some cgroup subsystems into
a same mountpoint, the name sequence of mount options and
cgroup directory name can not be the same.

For example, the mount option is cpuacct,cpu, but
mountpoint name is /sys/fs/cgroup/cpu,cpuacct. In current
runc, we set mount destination name from combining
subsystems, which comes from mount option from
/proc/self/mountinfo, so in my case the name would be
/sys/fs/cgroup/cpuacct,cpu, which is differernt from
host, and will break some applications.

Fix it by using directory name from host mountpoint.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2017-01-11 15:27:58 +08:00
Justin Cormack
50acb55233 Split the code for remounting mount points and mounting paths.
A remount of a mount point must include all the current flags or
these will be cleared:

```
The mountflags and data arguments should match the values used in the
original mount() call, except for those parameters that are being
deliberately changed.
```

The current code does not do this; the bug manifests in the specified
flags for `/dev` being lost on remount read only at present. As we
need to specify flags, split the code path for this from remounting
paths which are not mount points, as these can only inherit the
existing flags of the path, and these cannot be changed.

In the bind case, remove extra flags from the bind remount. A bind
mount can only be remounted read only, no other flags can be set,
all other flags are inherited from the parent. From the man page:

```
Since Linux 2.6.26, this flag can also be used to make an existing
bind mount read-only by specifying mountflags as:

MS_REMOUNT | MS_BIND | MS_RDONLY

Note that only the MS_RDONLY setting of the bind mount can be changed
in this manner.
```

MS_REC can only be set on the original bind, so move this. See note
in man page on bind mounts:

```
The remaining bits in the mountflags argument are also ignored, with
the exception of MS_REC.
```

Signed-off-by: Justin Cormack <justin.cormack@docker.com>
2016-12-16 14:01:17 -08:00
Aleksa Sarai
244c9fc426 *: console rewrite
This implements {createTTY, detach} and all of the combinations and
negations of the two that were previously implemented. There are some
valid questions about out-of-OCI-scope topics like !createTTY and how
things should be handled (why do we dup the current stdio to the
process, and how is that not a security issue). However, these will be
dealt with in a separate patchset.

In order to allow for late console setup, split setupRootfs into the
"preparation" section where all of the mounts are created and the
"finalize" section where we pivot_root and set things as ro. In between
the two we can set up all of the console mountpoints and symlinks we
need.

We use two-stage synchronisation to ensures that when the syscalls are
reordered in a suboptimal way, an out-of-place read() on the parentPipe
will not gobble the ancilliary information.

This patch is part of the console rewrite patchset.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-12-01 15:49:36 +11:00
Qiang Huang
b15668b36d Fix all typos found by misspell
I use the same tool (https://github.com/client9/misspell)
as Daniel used a few days ago, don't why he missed these
typos at that time.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-10-29 14:14:42 +08:00
Vivek Goyal
6c147f8649 Make parent mount private before bind mounting rootfs
This reverts part of the commit eb0a144b5e

That commit introduced two issues.

- We need to make parent mount of rootfs private before bind mounting
  rootfs. Otherwise bind mounting root can propagate in other mount
  namespaces. (If parent mount is shared).

- It broke test TestRootfsPropagationSharedMount() on Fedora.

  On fedora /tmp is a mount point with "shared" propagation. I think
  you should be able to reproduce it on other distributions as well
  as long as you mount tmpfs on /tmp and make it "shared" propagation.

  Reason for failure is that pivot_root() fails. And it fails because
  kernel does following check.

  IS_MNT_SHARED(new_mnt->mnt_parent)

  Say /tmp/foo is new rootfs, we have bind mounted rootfs, so new_mnt
  is /tmp/foo, and new_mnt->mnt_parent is /tmp which is "shared" on
  fedora and above check fails.

So this change broke few things, it is a good idea to revert part of it.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
2016-10-25 11:15:11 -04:00
Aleksa Sarai
c7ed2244f4 merge branch 'pr-1125'
LGTMs: @hqhq @mrunalp
Closes #1125
2016-10-25 10:05:28 +11:00
Alexander Morozov
1ab9d5e6f4 Merge pull request #845 from mrunalp/cp_tmpfs
Add support for copying up directories into tmpfs when a tmpfs is mounted over them
2016-10-21 13:47:16 -07:00
Aleksa Sarai
f8e6b5af5e rootfs: make pivot_root not use a temporary directory
Namely, use an undocumented feature of pivot_root(2) where
pivot_root(".", ".") is actually a feature and allows you to make the
old_root be tied to your /proc/self/cwd in a way that makes unmounting
easy. Thanks a lot to the LXC developers which came up with this idea
first.

This is the first step of many to allowing runC to work with a
completely read-only rootfs.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-10-20 12:55:58 +11:00
Daniel Dao
1b876b0bf2 fix typos with misspell
pipe the source through https://github.com/client9/misspell. typos be gone!

Signed-off-by: Daniel Dao <dqminh89@gmail.com>
2016-10-11 23:22:48 +00:00
Mrunal Patel
c7406f7075 Support copyup mount extension for tmpfs mounts
If copyup is specified for a tmpfs mount, then the contents of the
underlying directory are copied into the tmpfs mounted over it.

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2016-10-04 11:26:30 -07:00
Michael Crosby
70b16a5ab9 Remove check for binding to /
In order to mount root filesystems inside the container's mount
namespace as part of the spec we need to have the ability to do a bind
mount to / as the destination.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-09-29 15:26:09 -07:00
Akihiro Suda
53179559a1 MaskPaths: support directory
For example, the /sys/firmware directory should be masked because it can contain some sensitive files:
  - /sys/firmware/acpi/tables/{SLIC,MSDM}: Windows license information:
  - /sys/firmware/ibft/target0/chap-secret: iSCSI CHAP secret

Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
2016-09-23 16:14:41 +00:00
Mrunal Patel
f557996401 Add flag to allow getting all mounts for cgroups subsystems
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2016-09-15 15:19:27 -04:00
Serge Hallyn
52a8873f62 checkMountDesktionation: add swaps and uptime to /proc whitelist
Signed-off-by: Serge Hallyn <serge@hallyn.com>
2016-08-14 18:32:39 -05:00
Haiyan Meng
f40fbcd595 Fix the err info of mount failure
Signed-off-by: Haiyan Meng <haiyanalady@gmail.com>
2016-08-08 11:58:28 -04:00
Aleksa Sarai
c29695ad0a rootfs: don't change directory
There's no point in changing directory here. Syscalls are resolved local
to the linkpath, not to the current directory that the process was in
when creating the symlink. Changing directories just confuses people who
are trying to debug things.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-06-24 16:44:40 +10:00
Aleksa Sarai
0f1d6772c6 libcontainer: rootfs: use CleanPath when comparing paths
Comparisons with paths aren't really a good idea unless you're
guaranteed that the comparison will work will all paths that resolve to
the same lexical path as the compared path.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-06-22 01:45:32 +10:00
Aleksa Sarai
e991f041a1 Revert "Need to make sure labels applied to /dev"
Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-05-11 23:28:01 +10:00
Dan Walsh
77f312c51c Need to make sure labels applied to /dev
Signed-off-by: Dan Walsh <dwalsh@redhat.com>
2016-05-02 08:17:49 -04:00
Tatsushi Inagaki
eb0a144b5e Rootfs: reduce redundant parsing of mountinfo
Postpone parsing mountinfo until pivot_root() actually failed

Signed-off-by: Tatsushi Inagaki <e29253@jp.ibm.com>
2016-04-22 09:41:28 +09:00
Michael Crosby
27fd0575ee Merge pull request #763 from mrunalp/userns_cgroups_ro
Allow mounting cgroups as read-only when user namespace is configured
2016-04-19 10:36:00 -07:00
Mrunal Patel
a6104c3bbe Allow mounting cgroups as read-only when user namespace is configured
We use bind mount to achieve this as other file system remounts are disallowed
in a user namespace.

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2016-04-19 10:12:09 -07:00
Michael Crosby
6978875298 Add cause to error messages
This is the inital port of the libcontainer.Error to added a cause to
all the existing error messages.  Going forward, when an error can be
wrapped because it is not being checked at the higher levels for
something like `os.IsNotExist` we can add more information to the error
message like cause and stack file/line information.  This will help
higher level tools to know what cause a container start or operation to
fail.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-04-18 11:37:26 -07:00
Akihiro Suda
1829531241 Fix trivial style errors reported by go vet and golint
No substantial code change.
Note that some style errors reported by `golint` are not fixed due to possible compatibility issues.

Signed-off-by: Akihiro Suda <suda.kyoto@gmail.com>
2016-04-12 08:13:16 +00:00
Akihiro Suda
42234a85d1 Fix setupDev logic in rootfs_linux.go
setupDev was introduced in #96, but broken since #536 because spec 0.3.0 introduced default devices.

Fix #80 again
Fix docker/docker#21808

Signed-off-by: Akihiro Suda <suda.kyoto@gmail.com>
Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2016-04-11 10:29:40 -07:00
Thomas Tanaka
55aabc142c Only perform mount labelling when necessary
Do label mqueue when mounting it with label failed/not supported.

Signed-off-by: Thomas Tanaka <thomas.tanaka@oracle.com>
2016-03-24 13:38:18 -07:00
Mrunal Patel
64d87ebdec Merge pull request #585 from crosbymichael/dev-remountro
Remount /dev as ro after it is populated
2016-02-27 00:31:40 -08:00
Michael Crosby
c5a34a6fe2 Allow extra mount types
This allows the mount syscall to validate the addiontal types where we
do not have to perform extra validation and is up to the consumer to
verify the functionality of the type of device they are trying to
mount.

Fixes #572

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-02-26 15:21:33 -08:00