An introduction to Linux sound systems and APIs

When coding a program, one of the best ways to show users that an event has happened is to produce sounds. That’s why sound is now present in almost every program. Every operating system has different sound systems and APIs to access the sound card, so that no low-level coding is required to use the sound device. Programmers have many different choices concerning which system to use, especially under Linux — and maybe that’s the problem. This article will illustrate free sound architectures under Linux, as well as the different interfaces a programmer can use.Kernel sound drivers: OSS and ALSA

The most direct way is to talk to the kernel sound drivers. Linux has two:

Open Sound System

Open Sound System (OSS) comes in two versions: OSS/Free, which is a free software
maintained by the well-known kernel hacker Alan Cox, and 4Front Technologies’ OSS (OSS/Linux, formerly known as VoxWare, USS, and TASD), which is a proprietary implementation based on OSS/Free. OSS is available not only for Linux but also for BSD OSes and other Unixes.
That may be its only advantage, because this system is not very powerful and was officially replaced by ALSA in 2.5 kernels.

I’m not going to talk about programming for OSS, considering that it is deprecated, but it is not very difficult (to sum up, open /dev/dsp, /dev/dspW, or /dev/audio depending on the format you want, manipulate the file descriptor to read and write to the sound card, and use some ioctl to set parameters like volume). You can learn about advanced OSS programming in 4Front’s API specs.

ALSA

Advanced Linux Sound Architecture (ALSA) is the new Linux sound hardware abstraction layer that replaces OSS. In fact, it’s more than a simple HAL because it provides a user-space library named libasound. What’s more, it’s thread-safe, works well with SMP machines, and is backward-compatible with OSS/Free (using OSS emulation module). Of course, it’s also free and open source. A full description of its features and API can be found on ALSA’s Web site, and I would also suggest reading Paul Davis’ Tutorial.

Let’s take a look at ALSA’s API with a little example that will show the good and bad points of ALSA:

/* Example stolen from Paul Davis' tutorial (don't worry, he won't sue me -- GPL privilege)
 * Have just omitted the error handling for concision and added comments */

#include <stdio.h&gt
#include <stdlib.h&gt
#include <alsa/asoundlib.h&gt

main (int argc, char *argv[])
{
	int i;
	int err;
	short buf[128];
	snd_pcm_t *playback_handle;
	snd_pcm_hw_params_t *hw_params;

	/* Open the device */
	snd_pcm_open (&playback_handle, argv[1], SND_PCM_STREAM_PLAYBACK, 0);

	/* Allocate Hardware Parameters structures and fills it with config space for PCM */
	snd_pcm_hw_params_malloc (&hw_params);
	snd_pcm_hw_params_any (playback_handle, hw_params);

	/* Set parameters : interleaved channels, 16 bits little endian, 44100Hz, 2 channels */
	snd_pcm_hw_params_set_access (playback_handle, hw_params, SND_PCM_ACCESS_RW_INTERLEAVED);
	snd_pcm_hw_params_set_format (playback_handle, hw_params, SND_PCM_FORMAT_S16_LE);
	snd_pcm_hw_params_set_rate_near (playback_handle, hw_params, 44100, 0);
	snd_pcm_hw_params_set_channels (playback_handle, hw_params, 2);

	/* Assign them to the playback handle and free the parameters structure */
	snd_pcm_hw_params (playback_handle, hw_params);
	snd_pcm_hw_params_free (hw_params);

	/* Prepare & Play */
	snd_pcm_prepare (playback_handle);
	for (i = 0; i < 10; i++) {
		if ((err = snd_pcm_writei (playback_handle, buf, 128)) != 128) {
			(...)
		}
	}

	/* Close the handle and exit */
	snd_pcm_close (playback_handle);
	exit (0);
}

As you can see, the API is quite clear and not very hard to understand, even if it’s a bit long. ALSA acts at a level low enough for the programmer to be able to chose another design called interrupt-driven or callback-driven, which is fundamentally better because:

There is no blocking on reads/writes.
The application is “driven” by the callbacks and can continue to run.
The code is easily portable to other sound systems.

The low-level capabilities of ALSA make it a powerful system, but code becomes very tricky
when it comes to the full duplex with callbacks method, for which other sound systems may be preferrable. (Even the ALSA Audio API Tutorial advises using JACK for full duplex.)

About the practical use of Kernel Drivers…

Besides the full duplex difficulty, another problem for ALSA multimedia applications is what motivated the creation of sound servers: such programs need concurrent access to the sound card, and it’s not possible to have only one application be able to produce and capture sound at a time. Practically, designers must determine at which level a program should act: If it needs low-level access, ALSA can be a good solution, but if sound is not the main part of the project or if high-level operations are needed, consider instead the sound systems we’ll talk about next.

Sound servers

Sound servers are software that sit atop the audio core and put one more layer between the
user and the hardware. The act of talking to the kernel’s audio API comes with a little performance hit but results in a simpler API which enables applications to do software-based sample mixing. Software-based sample mixing enables applications to play multiple sounds at the same time on a single sound card without needing one a sound card that natively supports that. With it, applications can share the sound hardware, because sound servers support multiple channels (kernel sound servers support only one) by multiplexing and streaming the result to /dev/dsp. Some sound servers (esd, aRTs, NAS) are also built on a client/server model that enable sound to be played remotely and transparently on a network: this is called network transparency. If you want sound servers with such features, take a look at the Squeak homepage.

ESD

ESD, short for Enlightenment Sound Daemon, was originally developed for Enlightenment and is now part of the GNOME Project. ESoundD supports full duplex and network transparency, and is especially suited for sound effects and long unsynchronized music. You can extract the API from source: esd.h
and esdlib.c. Compile with gcc -o esdtest esdtest.c `esd-config --cflags --libs`.

/* Let's see a skeleton that a recording program can change */
#include <stdio.h>/* for NULL */
#include "esd.h"

int main()
{
	char buf[ESD_BUF_SIZE];
	int sock = -1;

	/* Set format : 16bits stereo stream for recording */
	esd_format_t format = ESD_BITS16 | ESD_STEREO | ESD_STREAM | ESD_RECORD;

	/* And only 1 command to open the recording :) with the format defined earlier,
	 * ESD's default rate (ESD_DEFAULT_RATE -> 44100Hz),
	 * on localhost:16001 (default -> NULL), and  with "testprog" as internal name */
	sock = esd_record_stream_fallback(format, ESD_DEFAULT_RATE, NULL, "testprog");
	if (sock <= 0) return 1;

	/* And now treat that */
	while (read(sock, buf, ESD_BUF_SIZE) > 0)
	{
		(...)
	}
	close(sock);
	return 0;
}

Piece of cake, isn’t it? The esd_record_stream function is in fact a wrapper that calls esd_open_sound(hostname) to connect to the server, negotiate with it, then sets the socket buffers size with esd_set_socket_buffers(sock, format, rate, 44100). The _fallback functions fall back to ALSA and OSS to try to play the sound if ESD fails, which is quite useful.

aRTs

The Analog RealTime Synthesizer is KDE’s sound server. Support is progressively fading for it and it’s probable that it will be abandoned in the future in favor of JACK. Nevertheless, various commentaries suggest that aRTs has better sound quality than ESD due to better sound processing routines (but higher latency too due to their complexity). aRTs also supports full duplex (but has been reported to be a bit buggy in this area) and network transparency and works on BSD operating systems. Documentation about the aRTs C API is quite rare (see the aRTs project Web site for a little page about it) so the best thing to do is to take a look at the source (artsc.h).

Here’s a little example to compile with gcc -o artstest artstest.c `artsc-config --cflags` `artsc-config --libs`.

#include <stdio.h>
#include <artsc.h>

int main()
{
    arts_stream_t stream;
    char buffer[8192];
    int bytes;
    int errorcode;

    /* Initialise aRTs with arts_init() */
	if ((errorcode = arts_init()) < 0)
    {
        fprintf(stderr, "arts_init error: %sn", arts_error_text(errorcode));
        return 1;
    }

    /* Open a stream for playback at 44100Hz, 16 bits, 2 channels as "aRTstest" */
	stream = arts_play_stream(44100, 16, 2, "aRTstest");

    /* example of treatment : read music from stdin and play it with arts_write */
	while((bytes = fread(buffer, 1, 8192, stdin)) > 0)
    {
        if ((errorcode = arts_write(stream, buffer, bytes)) < 0)
        {
            fprintf(stderr, "arts_write error: %sn", arts_error_text(errorcode));
            return 1;
        }
    }

    /* Does what it says */
    arts_close_stream(stream);
    arts_free();

    return 0;
}

The API is also very simple, as you can see. Some other useful commands include arts_suspend, to free the DSP device for aRTs-incapable programs to access it, and arts_stream_set, to configure some stream parameters.

JACK

JACK (also called JACKit) follows the long tradition of recursive acronyms — in this case, Jack Audio Connection Kit. This project was created as an implementation of the Linux Audio Applications Glue API project, which aimed at creating a high-bandwidth, low-latency inter-application communication API. It is a real-time sound server written for POSIX systems (and actually available for Linux and OS X) that enables different applications to have synchronous connections to the audio hardware and to share audio among themselves via a ports system. Programs can run as normal independent applications or as plugins within the JACK server. It uses the callback method shown earlier, implements ringbuffers, and is, in my humble opinion, the most excellent and promising sound server. The only bad point is that it is not widely available at the moment, but that should be fixed soon. (Gentoo already includes it and there are some third party RPMs for Fedora Core.) The API is well-documented and available on SourceForge. A fully documented example for a capture client is available on Berman Home Page.

Here we’ll start with something softer and smaller. Compilation is done using gcc -o jacktest `pkg-config --cflags --libs jack` jacktest.c.

/* Lighter version of simple_client.c */

#include <stdio.h>
#include <errno.h>
#include <unistd.h>
#include <jack/jack.h>

jack_port_t *input_port;
jack_port_t *output_port;

/* Processing thread: only transmit the data from input to output */
int process (jack_nframes_t nframes, void *arg)
{
	jack_default_audio_sample_t *out = (jack_default_audio_sample_t *) jack_port_get_buffer (output_port, nframes);
	jack_default_audio_sample_t *in = (jack_default_audio_sample_t *) jack_port_get_buffer (input_port, nframes);

	memcpy (out, in, sizeof (jack_default_audio_sample_t) * nframes);

	return 0;
}

void jack_shutdown (void *arg)
{
	exit (1);
}

int main ()
{
	jack_client_t *client;

	/* try to become a client of the JACK server */

	if ((client = jack_client_new ("test_client") == 0) {
		fprintf (stderr, "jack server not running?n");
		return 1;
	}

	/* tell the JACK server to call `process()' whenever there is work to be done. */
	jack_set_process_callback (client, process, 0);

	/* tell the JACK server to call `jack_shutdown()' if it ever shuts down, either entirely, or if it
	   just decides to stop calling us. */
	jack_on_shutdown (client, jack_shutdown, 0);

	/* display the current sample rate. once the client is activated  */
	printf ("engine sample rate: %lun", jack_get_sample_rate (client));

	/* create two ports: 1 input & 1 output*/
	input_port = jack_port_register (client, "input", JACK_DEFAULT_AUDIO_TYPE, JackPortIsInput, 0);
	output_port = jack_port_register (client, "output", JACK_DEFAULT_AUDIO_TYPE, JackPortIsOutput, 0);

	/* tell the JACK server that we are ready to roll */
	if (jack_activate (client)) {
		fprintf (stderr, "cannot activate client");
		return 1;
	}

	/* connect the ports: input one to the first ALSA PCM input and output to the first ALSA PCM output */
	if (jack_connect (client, "alsa_pcm:in_1", jack_port_name (input_port))) {
		fprintf (stderr, "cannot connect input portsn");
	}

	if (jack_connect (client, jack_port_name (output_port), "alsa_pcm:out_1")) {
		fprintf (stderr, "cannot connect output portsn");
	}

	/* Since this is just a toy, run for a few seconds, then finish */
	sleep (10);
	jack_client_close (client);
	exit (0);
}

This code looks a bit more complex. To understand it, you must think of JACK as a big and complex switchboard with inputs and outputs and on which you can interconnect devices (microphone, sound card, programs, etc.) by plugging them into it. The program copies what’s connected as input to what’s connected as output (meaning, generally speaking, a wire or cable). This explanation is meant to be a simple example. If you want a complete analysis, go to dis-dot-dat.net.

All the handling is done in the callback and the main program flow is still running (that’s why we’ve used sleep(10)). By the way, there are some real-time considerations when implementing the callback, like using non-blocking and deterministic calls only (malloc, printf, mutex_*, etc., must be banned). Here we’re hard-coding a connection between our created output_port and alsa_pcm:out_1, but if you need something more flexible, an interesting function is jack_get_ports (client, NULL, NULL, JackPortIsPhysical|JackPortIsOutput), which for example gets a list of physical output ports available.

Practical implementation

So many APIs — what now? What should a programmer wanting to use sound choose?

For somebody who wants to code a sound server or have a direct access to sound, ALSA is the obvious choice.
If you’re sure that your program will run on only one destkop environment and will be closely linked to it, then choose ESD (for Enlightenment or GNOME) or aRTs (for KDE).
If your system doesn’t need to be portable for the moment, Jack is full of promise.
For multi-systems/OS portability, join us tomorrow for part two of this discussion.

To be continued….

Vincenot has been a Linux user for eight years, and is currently a student at University Louis Pasteur in Strasbourg.

RELATED ARTICLESMORE FROM AUTHOR

How to Deploy Lightweight Language Models on Embedded Linux with LiteLLM

Automating Compliance Management with UTMStack’s Open Source SIEM & XDR

Using OpenTelemetry and the OTel Collector for Logs, Metrics, and Traces

Xen 4.19 is released

Advancing Xen on RISC-V: key updates

RELATED ARTICLES MORE FROM AUTHOR