October 4, 2006

Replacing init with Upstart

Author: Scott James Remnant

For years, most Linux distributions have been using an init daemon based on the one found in Unix System V. The init daemon is spawned by the kernel itself, and tasked with booting the rest of the system, starting all other processes, and taking care of them when they need to be stopped or when they die. While the System V init setup has worked well for Linux in the past, it hasn't aged well -- which is why we're replacing the aging init system with Upstart in Ubuntu 6.10, codenamed Edgy Eft.

Upstart is designed to be a replacement for System V Init (sysvinit), with a different purpose than other init replacements that have been developed. The problem that has been facing us as we've attempted to make things "just work" in Ubuntu is that modern desktop computers and servers are very different beasts from those in use ten years ago.

In the past, boot-time operations, such as checking and mounting the filesystem, were relatively simple. Only a limited arrangement of hardware was supported, and connection or disconnection of devices while the power was on was not possible. The kernel always knew the number and type of disks, so the user-space boot process could simply check and mount them without difficulty.

Dealing with modern hardware

Today we have to deal with a very different world. Hardware, such as disk drives and input devices, can be connected to buses that support a near-unlimited number of devices in arbitrarily complicated topologies.

To make matters more interesting, these devices can be plugged in and removed at any time, and are not required to power up until they are going to be used. While this means that the kernel now receives interrupts when hardware changes, the initial discovery of connected hardware is more problematic; a probe can be sent, but there's no point at which we know we have received all the responses.

When we bring network filesystems into the equation, the problem gets even more difficult. Not only do we have to wait for responses from the server, the network interfaces themselves need to be configured; this may involve uploading firmware into the card, negotiating with a wireless access point, requesting an IP address from a server on the network, and logging in to a Web page to gain Internet access.

Some solutions do exist for these problems. For example, Ubuntu probes for likely hardware on which it will find the root filesystem, and then sits in a tight loop until the expected hardware appears. These kinds of tricks rely both on waiting in the boot process and knowing the hardware that we're waiting for. As users demand that their computer boots ever faster, waiting in the boot process is simply untenable. Knowing in advance the hardware we're looking for might seem prescribed, yet a lot of the more interesting features we'd like to introduce break this assumption.

We would like certain daemons, such the HP Linux Printing and Imaging System, to be only started then the associated hardware is actually available. This way users who have the hardware benefit, and users who don't are not penalized. More interesting ideas include only running podcast daemons while an iPod is connected, and so on.

The Linux kernel has become increasingly better at supporting hot-pluggable hardware, and with the introduction of udev and the kernel driver-core, user-space is now provided with information about the connected hardware, as well as notification of connection and disconnection. Yet, despite this, we still rely on a static set of shell scripts for the boot process which make assumptions about what hardware is available at what point in the boot process.

Other attempts to modernize the init system

Previous attempts at replacing the init system such as the LSB specification and initNG have largely been focused on automatically determining the order of these scripts, both to ease the task of the developer, and to allow parallel start-up of scripts to reduce boot speed.

Other systems such as Solaris SMF and runit were instead designed to tackle service management problems, allowing a system administrator to query the state of, and change the set of, running services. Finally, Apple's launchd solves neither of these use cases and instead provides a single daemon that replaces the traditional Unix services of init, cron, inetd, and so on.

None of these systems tackled the problems we wanted to solve. We wanted an init daemon that allowed the selection and order of scripts to be determined not just by information in the scripts themselves, but by events coming from outside the init system, in particular udev. In fact, what we wanted was an init sequence driven entirely by these events and those of its own making.

To avoid reinventing the wheel, we first looked at how much effort it would be to use of modify the existing replacements to be able to do this. Sun SMF and Apple launchd were immediately ruled out due to licence issues. It was important for us that the solution be uncontroversially free so that other distributions might adopt it; many had already rejected these for GPL incompatibility reasons.

InitNG received a lot of study, including wondering whether our problems could be solved by dependencies on scripts that waited until released by udev events, however we felt that it was no better a solution than what we had already. The same solutions could be implemented with our existing init scripts using the features introduced in the LSB specification without replacing the init daemon itself.

What we wanted was a system that would allow filesystems listed in /etc/fstab to be checked and mounted when the kernel detected the block device. Likewise, it would configure network interfaces when the kernel detected the hardware, and if an IP address is obtained, attempt to mount remote filesystems. The more we thought about this, the more we realized that the chain of events did not need to stop at hardware either. If the scripts themselves generated events when they were finished, and any daemons started when they were running, other scripts and daemons could be started by those events.

An event-based init daemon

The term we adopted for this design was an event-based init daemon. Jobs to be managed would be split into two basic categories; scripts or binaries to be run when an event occurs, and services to be both started and stopped when events occur. Events would be generated by the init daemon at particular points, such as startup and shutdown, as well as whenever jobs changed state; other events would be received from other parts of the system such as udev.

An interesting comparison can be drawn to the dependency-based init daemons such as initNG; they work by giving each job a list of dependencies, and starting the dependencies when they are required by something else. This means that an ultimate set of goals is required to ensure that anything is started at all, normally given as the configuration for the different runlevels.

A dependency-based init daemon would start networking because it's a dependency of the Apache goal, and would mount the filesystems because they are a dependency of both the Apache and gdm goals. If either gdm or Apache fail to start, this means that networking won't be available unless it itself is a goal.

An event-based init daemon works the other way around; it starts off with a single event such as "startup" and everything else will be run by that or subsequent events. An event-based init daemon has no need for goals or runlevels, the system will boot as far as it can get with the available hardware; for a distribution, this means that the default installation can be far more flexible.

Networking will always be started if networking hardware is available, assuming the default configuration is for DHCP to be attempted. As with the dependency-based system, if no hardware is connected at boot time, Apache still won't start. However, with an event-based system, if the network card is plugged in a few minutes later, once it's been retrieved from the back of the sofa, Apache would be started automatically.

Our example tasks of mounting filesystems might work as follows:

  1. A "startup" event causes the udev daemon to be started.
  2. The udev daemon is then configured to send a "block-device-added" event for each new block device.
  3. A "block-device-added" event causes the device to be checked and mounted if listed in /etc/fstab.
  4. When all FHS-specified filesystems listed in /etc/fstab have been mounted, the "fhs-filesystem" event is sent.

Along side this, we can also be configuring network interfaces and mounting remote filesystems:

  1. The udev daemon is configured to send a "network-interface-added" event for each new network interface.
  2. Then a "network-interface-added" event causes the interface to be brought up and configured if listed in /etc/network/interfaces.
  3. Acquisition of an IP address causes the "network-interface-up" event to be sent.
  4. Setting of the default route causes the "default-route-up" event to be sent.
  5. Either of these events causes an attempt to mount remote network filesystems, which can therefore cause the "fhs-filesystem" event.

These chains of events, along with others, all cause different parts of the system to come up no matter how the user has configured it. There is no need for dependencies of the jobs to be adjusted if the system doesn't rely on any network filesystems, the event is automatically issued earlier.

Defining events

Events themselves are simple strings, with optional environment variables to convey more detailed information to the jobs handling the event; though we are considering adding arguments or an origin for the purposes of identifying which device was added, or which job was stopped, etc. Any part of the system can sent an event by using the "initctl" tool, or by communicating with init directly over a Unix domain socket.

Jobs are defined by files in the /etc/event.d directory, and the simplest can be just the name of a binary, or some shell code to run. The more complex can include shell code to be executed before the job is started and other shell code for after the job has stopped, as well as resource limits and environment settings. Jobs can be started when any of a list of named events occur, with multiple running instances either permitted or disallowed. They can be allowed to terminate normally themselves, stopped when another list of named events occur, or respawned on termination until any event in that list occurs.

A goal of the project is to simplify how the system is configured. Right now, one has to configure jobs to be run on startup, hourly, when a network device is up, when the AC power is reconnected, on shutdown, or when the system is suspended in several different places.

Because events can be received from any other process on the system, we intend to modify other daemons so that instead of running directories of scripts themselves, they send an event to init so all such jobs can be configured in /etc/event.d. Potential plans for the future include running jobs on behalf of users, time-based events so that daemons such as cron, anacron, and atd could be replaced and even, potentially, the ability to replace the inetd daemon.

It's our hope that with Upstart we can solve a lot of the problems that prevent Linux from "just working," without any ugly workarounds or race conditions, as well as make the job of sysadmins much easier. To find out more, visit the upstart Website, which has packages for Ubuntu and Debian, source code, mailing lists, and bug tracking for upstart.

Upstart is suitable for deployment in any Linux distribution as a replacement for sysvinit. One should be aware that, like other replacements, it has its own native configuration and will not run your existing init scripts unless configured to do so. The recommended migration plan, which Ubuntu is following, is to run the existing System V rc script and let that take care of the legacy init scripts.

A set of example jobs is available from the Website to do this, though they will require modification for other distributions. Once up and running, you can then migrate init scripts to upstart jobs one by one, as time permits, without dropping compatibility with other applications.

The Edgy Eft repositories include Upstart, so you can install preview releases of Edgy, starting with Knot 3, to get a hands-on look at Upstart right now.

Category:

  • Linux