Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 51 additions & 0 deletions bubblewrap.c
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,7 @@ typedef enum {
SETUP_OVERLAY_SRC,
SETUP_MOUNT_PROC,
SETUP_MOUNT_DEV,
SETUP_MOUNT_SYS,
SETUP_MOUNT_TMPFS,
SETUP_MOUNT_MQUEUE,
SETUP_MAKE_DIR,
Expand Down Expand Up @@ -175,6 +176,7 @@ enum {
PRIV_SEP_OP_BIND_MOUNT,
PRIV_SEP_OP_OVERLAY_MOUNT,
PRIV_SEP_OP_PROC_MOUNT,
PRIV_SEP_OP_SYS_MOUNT,
PRIV_SEP_OP_TMPFS_MOUNT,
PRIV_SEP_OP_DEVPTS_MOUNT,
PRIV_SEP_OP_MQUEUE_MOUNT,
Expand Down Expand Up @@ -352,6 +354,7 @@ usage (int ecode, FILE *out)
" --file-label LABEL File label for temporary sandbox content\n"
" --proc DEST Mount new procfs on DEST\n"
" --dev DEST Mount new dev on DEST\n"
" --sys DEST Mount new sysfs on DEST\n"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
" --sys DEST Mount new sysfs on DEST\n"
" --sys DEST Mount new sysfs instance on DEST\n"

" --tmpfs DEST Mount new tmpfs on DEST\n"
" --mqueue DEST Mount new mqueue on DEST\n"
" --dir DEST Create dir at DEST\n"
Expand Down Expand Up @@ -1122,6 +1125,11 @@ privileged_op (int privileged_op_socket,
die_with_mount_error ("Can't mount proc on %s", arg1);
break;

case PRIV_SEP_OP_SYS_MOUNT:
if (mount ("sysfs", arg1, "sysfs", MS_NOSUID | MS_NOEXEC | MS_NODEV, NULL) != 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you on a SELinux system? Because crablock will bind the host /sys/fs/selinux selinuxfs in the new sysfs. I need to check what was the exact reason for this, but it was the better choice.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting.
No, I am on a quite minimal, 'debootstrapped' Debian without AppArmor or SELinux.
So that would mean that if /sys/fs/selinux exists, it should be mounted on top of the new /sys in the container?

Copy link
Contributor

@rusty-snake rusty-snake Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that would mean that if /sys/fs/selinux exists, it should be mounted on top of the new /sys in the container?

Yes. If the host sysfs has a /sys/fs/selinux mounted, --sys should bind mount it on top of the container sysfs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the code over at crablock you are referencing:

https://codeberg.org/crabjail/crablock/src/commit/1a889d36177b0ff39e37a070405fb903c5f7e783/src/modules/mount_apifs.rs#L185

I guess I will have to spin up Fedora or something to test that.
As long as the container sysfs already contains the /sys/fs/selinux directory to serve as mountpoint, it should be smooth sailing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I have adapted the code to consider /sys/fs/selinux and successfully tested this under the latest Fedora 42 (with SELinux) and Debian (without SELinux), also as unprivileged user.

This gets me:
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime,seclabel)
selinuxfs on /sys/fs/selinux type selinuxfs (rw,nosuid,nodev,noexec,relatime)

Please, if you can, let me know what you think about it.

I test this with:

bwrap
--new-session
--die-with-parent
--as-pid-1
--unshare-all
--uid 0 --gid 0
--ro-bind ./rootfs /
--dev /dev
--proc /proc
--sys /sys
--hostname "alpine-test"
--clearenv
--setenv TERM "xterm-256color"
/bin/bash --login

And I generate the roofs with:
docker run --rm -v "$(pwd)/rootfs":/build alpine:3.22.1 sh -c 'apk --arch $(arch) -X https://dl-cdn.alpinelinux.org/alpine/latest-stable/main/ -X https://dl-cdn.alpinelinux.org/alpine/latest-stable/community/ -U --allow-untrusted --root /build --initdb add alpine-base bash iproute2 wireguard-tools net-tools htop psmisc bind-tools nano mg tcpdump netcat-openbsd iputils traceroute iptables strace util-linux'

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is special about /sys/fs/selinux that it should always be bind-mounted automatically, especially if the other filesystems below /sys should not?

On my Debian system I have /sys/firmware/efi/efivars, /sys/kernel/security, /sys/fs/cgroup, /sys/fs/pstore, /sys/fs/bpf and more. Should those be bind-mounted too? If yes, why? If no, why not?

I think it might be better (and certainly easier to document) for --sys to only mount the sysfs, and if callers of bwrap (such as Flatpak or Glycin or whatever) want to bind-mount other /sys-based filesystems like /sys/fs/selinux, they can --bind them explicitly.

die_with_mount_error ("Can't mount sys on %s", arg1);
break;

case PRIV_SEP_OP_TMPFS_MOUNT:
{
cleanup_free char *mode = NULL;
Expand Down Expand Up @@ -1191,6 +1199,7 @@ setup_newroot (bool unshare_pid,
{
SetupOp *op;
int tmp_overlay_idx = 0;
struct stat sbuf;

for (op = ops; op != NULL; op = op->next)
{
Expand Down Expand Up @@ -1443,6 +1452,36 @@ setup_newroot (bool unshare_pid,

break;

case SETUP_MOUNT_SYS:
if (ensure_dir (dest, 0755) != 0)
die_with_error ("Can't mkdir %s", op->dest);

if (unshare_pid || opt_pidns_fd != -1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sysfs is also about unshare_net. Also check this with your mount is too revealing error. There is a difference between --unshare-user --share-net and --unshare-user --unshare-net for sysfs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like something that should be addressed, particularly if --unshare-net is involved in your original use-case for wanting this feature.

(Of course, if you change --sys so that it unconditionally mounts a new sysfs instance, then there will be no need to have this check at all.)

{
/* Our own sysfs */
privileged_op (privileged_op_socket,
PRIV_SEP_OP_SYS_MOUNT, 0, 0, 0,
dest, NULL);

/* In case the host utilizes SELinux, /sys/fs/selinux should be shared with the sandbox */
char *selinux_src_dir = "oldroot/sys/fs/selinux";
cleanup_free char *selinux_dest_dir = strconcat (dest, "/fs/selinux");
if (stat (selinux_src_dir, &sbuf) == 0 && stat (selinux_dest_dir, &sbuf) == 0)
{
privileged_op (privileged_op_socket,
PRIV_SEP_OP_BIND_MOUNT, 0, 0, 0,
selinux_src_dir, selinux_dest_dir);
}
}
else
{
/* Use system sysfs, as we share pid namespace anyway */
privileged_op (privileged_op_socket,
PRIV_SEP_OP_BIND_MOUNT, 0, 0, 0,
"oldroot/sys", dest);
}
Comment on lines +1476 to +1482
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems quite "do what I mean". I think it would be easier to document this option clearly if it always created a new sysfs instance, and never bind-mounted the existing /sys.

(Also easier to be sure that it's correct - if the option autodetects whether it needs a new instance or can keep the current instance, then that opens the possibility that the autodetection has bugs.)

break;

case SETUP_MOUNT_TMPFS:
assert (dest != NULL);
assert (op->perms >= 0);
Expand Down Expand Up @@ -1642,6 +1681,7 @@ resolve_symlinks_in_ops (void)
case SETUP_TMP_OVERLAY_MOUNT:
case SETUP_MOUNT_PROC:
case SETUP_MOUNT_DEV:
case SETUP_MOUNT_SYS:
case SETUP_MOUNT_TMPFS:
case SETUP_MOUNT_MQUEUE:
case SETUP_MAKE_DIR:
Expand Down Expand Up @@ -2100,6 +2140,17 @@ parse_args_recurse (int *argcp,
op = setup_op_new (SETUP_MOUNT_PROC);
op->dest = argv[1];

argv += 1;
argc -= 1;
}
else if (strcmp (arg, "--sys") == 0)
{
if (argc < 2)
die ("--sys takes an argument");

op = setup_op_new (SETUP_MOUNT_SYS);
op->dest = argv[1];

argv += 1;
argc -= 1;
}
Expand Down
4 changes: 4 additions & 0 deletions bwrap.xml
Original file line number Diff line number Diff line change
Expand Up @@ -405,6 +405,10 @@
<term><option>--dev <arg choice="plain">DEST</arg></option></term>
<listitem><para>Mount new devtmpfs on <arg choice="plain">DEST</arg></para></listitem>
</varlistentry>
<varlistentry>
<term><option>--sys <arg choice="plain">DEST</arg></option></term>
<listitem><para>Mount sysfs on <arg choice="plain">DEST</arg></para></listitem>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definitely needs to expand on what this option does, and why it isn't just --bind /sys DEST.

For example perhaps something like

Suggested change
<listitem><para>Mount sysfs on <arg choice="plain">DEST</arg></para></listitem>
<listitem><para>Mount a new instance of sysfs on <arg choice="plain">DEST</arg>.
This is not necessarily the same as <literal>--bind /sys DEST</literal>:
for example, if used with <arg>--unshare-net</arg>,
the <filename>/sys/class/net</filename> in the new sysfs instance
will reflect the new network namespace,
whereas <literal>--bind /sys DEST</literal> would result in
a <filename>/sys/class/net</filename> that describes the original network
namespace.</para></listitem>

If --sys has the side-effect of bind-mounting other filesystems (which I think it should not) then the man page should mention that.

Conversely, if --sys does not have the side-effect of bind-mounting other filesystems and instead leaves it up to the caller of bwrap, then it would be good to say something like

This does not mount any filesystems that might exist below the original /sys, such as /sys/fs/selinux or /sys/kernel/tracing. Those filesystems should be bind-mounted separately, using --bind or similar, if desired.

</varlistentry>
<varlistentry>
<term><option>--tmpfs <arg choice="plain">DEST</arg></option></term>
<listitem>
Expand Down
1 change: 1 addition & 0 deletions completions/bash/bwrap
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ _bwrap() {
--size
--symlink
--sync-fd
--sys
--tmp-overlay
--uid
--unsetenv
Expand Down
1 change: 1 addition & 0 deletions completions/zsh/_bwrap
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ _bwrap_args=(
'--size[Set size in bytes for next action argument]: :->after_size'
'--symlink[Create symlink at DEST with target SRC]:symlink target:_files:symlink to create:_files:'
'--sync-fd[Keep this fd open while sandbox is running]: :_guard "[0-9]#" "file descriptor to keep open"'
'--sys[Mount new sysfs on DEST]:mount point for sysfs:_files -/'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps

Suggested change
'--sys[Mount new sysfs on DEST]:mount point for sysfs:_files -/'
'--sys[Mount new sysfs instance on DEST]:mount point for sysfs:_files -/'

to emphasize that, like --proc, this is not the same thing as bind-mounting the existing /sys.

'--uid[Custom uid in the sandbox (requires --unshare-user or --userns)]: :_guard "[0-9]#" "numeric group ID"'
'(--clearenv)--unsetenv[Unset an environment variable]:variable to unset:_parameters -g "*export*"'
'--unshare-all[Unshare every namespace we support by default]'
Expand Down