sysjail -- imprison a process and its descendants
sysjail [-bdhiv] [LONG_OPTIONS] [-4 ipv4] [-6 ipv6] [-f device] [-j file] -l [ -u username | -U username ] path hostname command ...
The sysjail imprisons ("jails") a process and its descendants. Processes in a jail have a restricted view of system resources.
A jailed environment will persist until all children have exited, an internal error occurs, or the sysjail process is signalled with a SIGTERM.
The following long options are also available (note that these may override behaviour described in this document):
Emulation
Linux and FreeBSD binaries may via kernel emulation. If a process
changes into (or begins as) FreeBSD or Linux, it's afforded the same protection
as native binaries. This manual differentiates supporting calls
whenever relevant. Note that the operating environment may cause problems;
for example, if a Linux system is using procfs, process filtering
is bypassed.
File-system Resources
When the primary child is forked, prior to execution, the jail environment
is secured by calls to chroot(2)
and chdir(2)
to the prison's root.
Calls to mount(2)
and unmount(2)
are denied to jailed processes with
EPERM, as are calls to mknod(2)
, the swapctl(2)
function group,
chflags(2)
, fchflags(2)
, and lchflags(2)
(NetBSD).
Interprocess Communication Resources
All System V Interface Definition, Fourth Edition (``SVID4'') interprocess
functions are denied with EPERM, including, but not limited to,
msgctl(2)
(and related message passing functions), shmctl(2)
(and related
shared memory functions), and semctl(2)
(and related semaphore functions).
Compliance to these functions vary greatly among systems and
emulations; all are denied.
Network Resources
Calls to gethostname(2)
(Linux, SunOS?) result in the jail's internal
hostname being returned. Security note: if one accesses the returned
buffer beyond the nil-pointer, the original value from the system call is
still resident). Compatibility note: Although SUS states that the buffer
is not guaranteed to be nil-terminated, sysjail will always nil-terminate
the string. Calls to sethostname(2)
(SunOS?) are denied with EPERM.
Attempts to get or set the hostname by sysctl(3)
are similarly re-written
or denied. Networks calls to bind(2)
are restricted to AF_UNIX, AF_INET,
and AF_INET6 (if running with the -6 flag). AF_INET addresses are rewritten
to the jail's address, and if -6 has been specified, AF_INET6
addresses are also re-written. Calls to socket(2)
are filtered for
AF_UNIX, AF_ROUTE, AF_INET6, and AF_INET. These are denied with
EPROTONOSUPPORT. SOCK_RAW and the IPPROTO_RAW and IPPROTO_ROUTING protocols
are further denied. These bind(2)
and socket(2)
restrictions are
also followed on Linux through socketcall(2)
. Most ioctl(2)
writable
network operations, those found in sys/sockio.h, are denied with EPERM
(adding network interfaces, bridges, etc.).
Process Resources
Processes in a jail are denied access to processes not in the jail. All
system calls with PID inputs are filtered in this regard. Relevant
effected system calls are setpriority(2)
, getpriority(2)
, setpgid(2)
,
getpgid(2)
, killpg(2)
(Linux), kill(2)
, getsid(2)
, sched_setscheduler(2)
(Linux, FreeBSD), sched_getscheduler(2)
(Linux, FreeBSD),
sched_setparam(2)
(Linux, FreeBSD), sched_getparam(2)
(Linux, FreeBSD),
ptrace(2)
, fktrace(2)
(NetBSD), and ktrace(2)
. Note that getpgid(2)
,
getppid(2)
, and getsid(2)
may return processes outside of the prison.
These processes may not be acted upon. The usual return value when processes
outside of the prison are accessed is ESRCH, although this isn't
always the case. Values from sysctl(3)
are process-filtered if matching
the KERN_PROC, KERN_PROC2, and KERN_PROC_ARGS values (effecting ps(1)
and
other utilities). Calls to setrlimit(2)
are denied with EPERM if setting
the maximum value beyond the parent sysjail process's maximum value (does
not conform to FreeBSD's jail).
The kill(2) function requires special mention. If a process sends a signal to process 1, init(8) , the sig value is changed to 0 and allowed (signal is ignored but non-super-user processes receive an EPERM error). If the signal is -1 and the process owner is the super-user, all processes in the jail but the caller are delivered the relevant sig.
Miscellaneous Resources
Calls to reboot(2)
are denied with EPERM but will regardless cause the
jail to halt if raised by the super-user. Most writable sysctl(3)
entities
are denied with EPERM. All ioctl(2)
LKM operations, those found in
sys/lkm.h, are denied with EPERM.
The following is a mini-tutorial for preparing a prison to service sshd(8) . The host must first be analysed and appropriately configured, then packages downloaded, then jails configured appropriately. This is a very general approach. We consider a host with address 192.168.1.20 and jails with addresses from 192.168.1.21. Jails will be located in /mnt/jails with names j1 through j9. In this example, this range will be expressed by <jail> (as in /mnt/jails/<jail>/ when referring to file-system root).
Host Configuration
The host may be configured to have multiple network interface aliases for
each jail. To accomplish this, various services must be modified to
account for aliasing. httpd(8)
on the host must have its Listen value
changed to 192.168.1.21 (the host address) to prevent it from running on
all addresses. The same must occur for the sshd(8)
ListenAddress value.
portmap(8)
, on OpenBSD, by default listens on all interfaces. This may
not be changed. Records in inetd.conf(5)
should also be appropriately
modified. All of these services must be restarted after changing their
configurations. In our example, we configure the extra aliases by adding
alias entities for each jail address to hostname.if(5)
, then restart the
network with netstart(8)
.
Since this jail deployment is relatively simple, we configure syslogd(8) in /etc/rc.conf.local to use -a /mnt/jails/<jail>/dev/log. An alternative solution is to use -u and use host-based logging for jails.
Acquire Packages
Packages must first be acquired with mkjail(1)
, which is a simple wrapper
for downloading binary sets and configuring device directories:
# mkjail -c /pub/sets -d ssh ftp://site/pub/ /mnt/jails/j1
This caches package sets in /pub/sets, creates sshd(8) devices and downloads from ftp://site/pub/, which we assume contains a NetBSD or OpenBSD distribution, and unpacks into /mnt/jails/jail-1. Subsequent jails may be deployed as follows:
# mkjail -d ssh /pub/sets /mnt/jails/<jail>
At this point, the jails are resident on each system, and may be configured and executed appropriately.
Jail Configuration
The password of each jail's root user must be set:
# jail /mnt/jails/j1 j1 192.168.1.21 /usr/bin/passwd root
The root password must then be executed appropriately. The same process must be repeated for all jails. Finally, the jails must be started:
# sysjail -b -4 192.168.1.21 /mnt/jails/j1 j1 /bin/sh /etc/rc
This starts the jail in the background using its rc(8) . It may be logged into using ssh(1) :
% ssh root@192.168.1.21
The jail will exit when all imprisoned processes have exited.
The sysjail utility returns the exit code of the process started with command. If internal errors occur, it returns 127.
sysjail(3) , sjls(1) , jail(1) , jls(1)
The sysjail tool is a complete re-write of sysjail 1.0.4.
The sysjail utility was written by Kristaps Dzonsons for the bsd.lv project.
When kill(2) is called with -1 as the pid, all processes in the jail (but the caller) are delivered the signal. However, since exit notifications are asynchronous, a process may exit during this operation. If another process starts in that time with the same PID, this process will be killed. It's very unlikely that this race condition may be exploited.
If many `cannot create /dev/null: Device not configured' errors appear, this is because the dev directory in the jail is not on the root device.
It's of critical importance that user identifiers do not propogate across jails or into the host system. In other words, if uid 100 exists both within a jail and on the host system, the prison's uid, from the kernel's perspective, is identical to the host's. Calls to setpriority(2) and similar functions will effect both users. Since uids are not managed by the kernel (as regards mapping to an environment), and users may be arbitrarily assigned (perhaps maliciously), future versions of sysjail might offer configuration values for restricting uid and gid addressing.