Linux kernel 3.8 has been released this week which reminded me to write about recent Linux kernel changes which may help in improving sydbox. Below is a short summary of new, and not so new, features merely to get myself to stop slacking and start coding again.
Per-process namespace support
Per-process namespace support is completed with linux-3.8. This feature provides a nice way to separate resources on a per-process basis, for example a process might see a set mountpoints, PID numbers, and network stack state, and a process in other namespace might see others. For more information see the Linux-3.8 Changes page on kernelnewbies and the Namespaces in Operation articles on LWN.
New in linux-3.8, this
ptrace(2) option makes the tracer send
tracees on exit. This is useful for
ptrace(2) based sandboxes for which a
resumed tracee is a security risk. See the related commit for more
This is by far my favourite feature. Introduced with Linux kernel 3.5 and
also known as seccomp mode 2 or user filters this feature lets you add
basic system call filters expressed as Berkeley Packet Filter programs.
Even though sydbox still has to use
ptrace(2) to do more sophisticated
argument checking, this feature removes the need to stop the tracee on every
system call entry and exit which is a PITA especially when tracing multithreaded
programs. sydbox-1 takes advantage of this feature using
SECCOMP_RET_TRACE which signals the tracer with the new
Here are some useful links:
- Using simple seccomp filters
- A library for seccomp filters
- vsftpd’s seccomp sandbox
- openssh’s seccomp filter
- seccomp filtering with systemd
PTRACE_SEIZE & PTRACE_INTERRUPT
Probably even older than seccomp user filters, these ptrace requests allow the tracer to attach to tracee without trapping it or affecting its job control states. See, http://thread.gmane.org/gmane.linux.kernel/1136930 for more information.