Secure Programming for Linux and Unix HOWTO | ||
---|---|---|
Prev | Chapter 6. Structure Program Internals and Approach | Next |
A ``race condition'' can be defined as ``Anomolous behavior due to unexpected critical dependence on the relative timing of events'' [FOLDOC]. Race conditions generally involve one or more processes accessing a shared resource (such a file or variable), where this multiple access has not been properly controlled.
In general, processes do not execute atomically; another process may interrupt it between essentially any two instructions. If a secure program's process is not prepared for these interruptions, another process may be able to interfere with the secure program's process. Any pair of operations must not fail if another process's code arbitrary code is executed between them.
Race condition problems can be notionally divided into two categories:
Interference caused by untrusted processes. Some security taxonomies call this problem a ``sequence'' or ``non-atomic'' condition. These are conditions caused by processes running other, different programs, which ``slip in'' other actions between steps of the secure program. These other programs might be invoked by an attacker specifically to cause the problem. This paper will call these sequencing problems.
Interference caused by trusted processes (from the secure program's point of view). Some taxonomies call these deadlock, livelock, or locking failure conditions. These are conditions caused by processes running the ``same'' program. Since these different processes may have the ``same'' privileges, if not properly controlled they may be able to interfere with each other in a way other programs can't. Sometimes this kind of interference can be exploited. This paper will call these locking problems.
In general, you must check your code for any pair of operations that might fail if arbitrary code is executed between them.
Note that loading and saving a shared variable are usually implemented as separate operations and are not atomic. This means that an ``increment variable'' operation is usually converted into loading, incrementing, and saving operation, so if the variable is shared the other process may interfere with the incrementing.
Secure programs must determine if a request should be granted, and if so, act on that request. There must be no way for an untrusted user to change anything used in this determination before the program acts on it. This kind of race condition is sometimes termed a ``time of check - time of use'' (TOCTOU) race condition.
This issue repeatedly comes up in the filesystem. Programs should generally avoid using access(2) to determine if a request should be granted, followed later by open(2), because users may be able to move files around between these calls, possibly creating symbolic links or files of their own choosing instead. A secure program should instead set its effective id or filesystem id, then make the open call directly. It's possible to use access(2) securely, but only when a user cannot affect the file or any directory along its path from the filesystem root.
For example, when performing a series of operations on a file's metainformation (such as changing its owner, stat-ing the file, or changing its permission bits), first open the file and then use the operations on open files. This means use the fchown( ), fstat( ), or fchmod( ) system calls, instead of the functions taking filenames such as chown(), chgrp(), and chmod(). Doing so will prevent the file from being replaced while your program is running (a possible race condition). For example, if you close a file and then use chmod() to change its permissions, an attacker may be able to remove the file between those two steps and create a symbolic link to another file (say /etc/passwd). Other interesting files include /dev/zero, which can provide an infinitely-long data stream of input to a program. Also, avoid the use of the access( ) function to determine your ability to access a file: using the access( ) function followed by an open( ) is a race condition, and almost always a bug. This is only necessary if it's possible for an untrusted process to modify the relevant directory its ancestors.
This issue particularly comes up in the /tmp and /var/tmp directories, which are shared by all users. Avoid using these directories and their subdirectories if possible. In particular, imagine what would happen if users created files (including symbolic links) at arbitrary times in directories you intend to use (for example, between the time you determine a filename and the time you try to open it). You can't even just check to see if the given file is a symbolic link; if it's owned by an untrusted user, the user could change this after the check.
There are often situations in which a program must ensure that it has exclusive rights to something (e.g., a file, a device, and/or existence of a particular server process). Any system which locks resources must deal with the standard problems of locks, namely, deadlocks (``deadly embraces''), livelocks, and releasing ``stuck'' locks if a program doesn't clean up its locks. A deadlock can occur if programs are stuck waiting for each other to release resources. For example, a deadlock would occur if process 1 locks resources A and waits for resource B, while process 2 locks resource B and waits for resource A. Many deadlocks can be prevented by simply requiring all processes to lock resources in the same order (e.g., you must lock resources alphabetically).
On Unix-like systems resource locking has traditionally been done by creating a file to indicate a lock, because this is very portable. It also makes it easy to ``fix'' stuck locks, because an administrator can just look at the filesystem to see what locks have been set. Stuck locks can occur because the program failed to clean up after itself (e.g., it crashed or malfunctioned) or because the whole system crashed. Note that these are ``advisory'' (not ``mandatory'') locks - all processes needed the resource must cooperate to use these locks.
However, there are several traps to avoid. First, a program with root privileges can open a file, even if it sets the exclusive mode (O_EXCL) when creating the file (O_EXCL mode normally fails if the file already exists). So, if you want to use a file to indicate a lock, but you might do this as root, don't use open(2) and the exclusive mode. A simple approach is to use link(2) instead to create a hard link to some file in the same directory; not even root can create a hard link if it already exists.
Second, if the lock file may be on an NFS-mounted filesystem, then you have the problem that NFS version 2 doesn't completely support normal file semantics. This can even be a problem for work that's supposed to be ``local'' to a client, since some clients don't have local disks and may have all files remotely mounted via NFS. The manual for open(2) explains how to handle things in this case (which also handles the case of root programs):
"... programs which rely on [the O_CREAT and O_EXCL flags of open(2)] for performing locking tasks will contain a race condition. The solution for performing atomic file locking using a lockfile is to create a unique file on the same filesystem (e.g., incorporating hostname and pid), use link(2) to make a link to the lockfile and use stat(2) on the unique file to check if its link count has increased to 2. Do not use the return value of the link(2) call."
Obviously, this solution only works if all programs doing the locking are cooperating, and if all non-cooperating programs aren't allowed to interfere. In particular, the directories you're using for file locking must not have permissive file permissions for creating and removing files.
NFS version 3 added support for O_EXCL mode in open(2); see IETF RFC 1813, in particular the "EXCLUSIVE" value to the "mode" argument of "CREATE". Sadly, not everyone has switched to NFS version 3 at the time of this writing, so you you can't depend on this in portable programs.
If you're locking a device or the existence of a process on a local machine, try to use standard conventions. I recommend using the Filesystem Hierarchy Standard (FHS); it is widely referenced by Linux systems, but it also tries to incorporate the ideas of other Unix-like systems. The FHS describes standard conventions for such locking files, including naming, placement, and standard contents of these files [FHS 1997]. If you just want to be sure that your server doesn't execute more than once on a given machine, you should usually create a process identifier as /var/run/NAME.pid with the pid as its contents. In a similar vein, you should place lock files for things like device lock files in /var/lock. This approach has the minor disadvantage of leaving files hanging around if the program suddenly halts, but it's standard practice and that problem is easily handled by other system tools.
It's important that the programs which are cooperating using files to represent the locks use the "same" directory, not just the same directory name. This is an issue with networked systems: the FHS explicitly notes that /var/run and /var/lock are unshareable, while /var/mail is shareable. Thus, if you want the lock to work on a single machine, but not interfere with other machines, use unshareable directories like /var/run (e.g., you want to permit each machine to run its own server). However, if you want all machines sharing files in a network to obey the lock, you need to use a directory that they're sharing; /var/mail is one such location. See FHS section 2 for more information on this subject.
Of course, you need not use files to represent locks. Network servers often need not bother; the mere act of binding can act as a kind of lock, since if there's an existing server bound to a given port, no other server will be able to bind to that port.
Another approach to locking is to use POSIX record locks, implemented through fcntl(2) as a ``discretionary lock''. These are discretionary, that is, using them requires the cooperation of the programs needing the locks (just as the approach to using files to represent locks does). There's a lot to recommend POSIX record locks: POSIX record locking is supported on nearly all Unix-like platforms (it's mandated by POSIX.1), it can lock portions of a file (not just a whole file), and it can handle the difference between read locks and write locks. Even more usefully, if a process dies, its locks are automatically removed, which is usually what is desired.
You can also use mandatory locks, which are based on System V's mandatory locking scheme. These only apply to files where the locked file's setgid bit is set, but the group execute bit is not set. Also, you must mount the filesystem to permit mandatory file locks. In this case, every read(2) and write(2) is checked for locking; while this is more thorough than advisory locks, it's also slower. Also, mandatory locks don't port as widely to other Unix-like systems (they're available on Linux and System V-based systems, but not necessarily on others). Note that processes with root privileges can be held up by a mandatory lock, too, making it possible that this could be the basis of a denial-of-service attack.