Linux kernel 3.8 has been released this week which reminded me to write about recent Linux kernel changes which may help in improving sydbox. Below is a short summary of new, and not so new, features merely to get myself to stop slacking and start coding again.

Per-process namespace support

Per-process namespace support is completed with linux-3.8. This feature provides a nice way to separate resources on a per-process basis, for example a process might see a set mountpoints, PID numbers, and network stack state, and a process in other namespace might see others. For more information see the Linux-3.8 Changes page on kernelnewbies and the Namespaces in Operation articles on LWN.

PTRACE_O_EXITKILL

New in linux-3.8, this ptrace(2) option makes the tracer send SIGKILL to tracees on exit. This is useful for ptrace(2) based sandboxes for which a resumed tracee is a security risk. See the related commit for more information.

SECCOMP_MODE_FILTER

This is by far my favourite feature. Introduced with Linux kernel 3.5 and also known as seccomp mode 2 or user filters this feature lets you add basic system call filters expressed as Berkeley Packet Filter programs. Even though sydbox still has to use ptrace(2) to do more sophisticated argument checking, this feature removes the need to stop the tracee on every system call entry and exit which is a PITA especially when tracing multithreaded programs. sydbox-1 takes advantage of this feature using SECCOMP_RET_TRACE which signals the tracer with the new ptrace(2) event PTRACE_EVENT_SECCOMP.

Here are some useful links:

PTRACE_SEIZE & PTRACE_INTERRUPT

Probably even older than seccomp user filters, these ptrace requests allow the tracer to attach to tracee without trapping it or affecting its job control states. See, http://thread.gmane.org/gmane.linux.kernel/1136930 for more information.