September 5, 2008

Use LatencyTOP to find out where process latency is coming from

Author: Ben Martin

The latency of the execution of a particular task can be affected by what tasks a system is running, the condition of the network the machine is connected to, and how well the various server machines on the network are performing. LatencyTOP is a command-line tool and kernel patch that lets you see what is causing latency in the applications on your system.

Unfortunately, because the LatencyTOP project is still in its early days, you might have to patch your kernel in order to use LatencyTOP. You should start by downloading both the userland application and the kernel patch from the LatencyTOP page.

The simplest method to determine whether you have to recompile a kernel for your distribution is to build the latencytop command and attempt to run it, as shown below. You have to run LatencyTOP as the root user. Note that for Fedora 9 systems running the kernel or a later version, you already have LatencyTOP available without any kernel modifications or need to reboot. Those running Fedora 9 can simply install the latencytop package with yum. Ubuntu Hardy lacks a LatencyTOP package. Community members have contributed a few 1-Click install files for LatencyTOP on openSUSE 11.

$ tar xzvf /.../latencytop-0.4.tar.gz
$ cd ./latencytop*
$ make
$ ./latencytop
Please enable the CONFIG_LATENCYTOP configuration in your kernel.
# sudo ./latencytop
LatencyTOP version 0.4 (C) 2008 Intel Corporation

Cause Maximum Percentage
opening cdrom device 8.7 msec 33.5 %
Executing raw SCSI command 7.6 msec 38.0 %
Waiting for event (poll) 3.9 msec 5.0 %
SCSI cdrom ioctl 3.7 msec 5.7 %
Waiting for event (select) 3.6 msec 17.1 %
Waiting for TTY to finish sending 1.5 msec 0.5 %
Userspace lock contention 0.2 msec 0.2 %

If you find that you do have to patch, compile, install and boot a new kernel, apply the patch with the following commands. You must also enable the CONFIG_LATENCYTOP=y option before building the kernel. If all goes well you should have a /proc/sys/kernel/latencytop file when you boot into your new kernel.

$ tar xjf /.../linux-
$ cd ./linux-
$ cat /.../latencytop.patch | patch -p1
$ make menuconfig
$ make bzImage modules

The best way to invoke latencytop is by passing no command-line options. LatencyTOP will then start with a curses interface and continually update the display with information about what process has the greatest latency on your system. If you pass the -d option, LatencyTOP will print the current top latencies and exit. If you pass --block, the program will print some debug-level messages about block IO.

Keyboard input that LatencyTOP accepts when running in curses mode is not case-sensitive. The R key will reset the display, update the current latency information for all processes, and select a process that has high latency and show the details for that process. At the bottom of the LatencyTOP display you will see the names of processes drawn from left to right. This is part of the list of all processes on the system, which you can freely move through to select the process that is of greatest interest to you. Any of the X, B, or C keys will move one process ID (PID) forward, and any of the Z, A, or D keys will move one PID backward. Although the code that interprets the keyboard input events does not make it obvious, you can also use the left, right, up, and down arrow keys to move through the PID list. The current process is shown in reverse video print. The Q key will quit LatencyTOP.

One thing that I found counterintuitive about LatencyTOP is that you have to keep pressing the R key in order for LatencyTOP to refresh its information about process latencies. I couldn't find a way to tell the tool to refresh the latency information automatically every n milliseconds. If you are monitoring one of the processes that is among those with the highest latencies on the system, this is not an issue -- just pressing R a few times until LatencyTOP selects the process you are interested in works fine. But I would like to be able to select the process I am interested in with the arrow keys and have the latency information displayed be regularly updated for that process.

Shown below is the latency of the updatedb command, which builds a filesystem index that can later be used to search for files by their name. When this data was taken, updatedb was scanning an XFS filesystem, which accounts for all the xfs_buf calls in the latency printout. The bottom line in the capture is an abridged version of the PID list that can be navigated with the keyboard arrow keys or the other hotkeys mentioned above.

LatencyTOP version 0.4 (C) 2008 Intel Corporation

Cause Maximum Percentage
xfs_buf_iowait xfs_buf_iostart xfs_buf_read_flags 76.3 msec 29.2 %
Walking directory tree 56.8 msec 0.9 %
Reading directory content 25.3 msec 1.0 %
Scheduler: waiting for cpu 23.6 msec 24.6 %
xfs_buf_iowait xfs_buf_iostart xfs_buf_read_flags 20.7 msec 12.9 %
xfs_buf_lock _xfs_buf_find xfs_buf_get_flags xfs_b 19.1 msec 0.3 %
stat() operation 18.1 msec 0.1 %
opening cdrom device 10.1 msec 0.9 %
Executing raw SCSI command 8.2 msec 0.9 %

Process updatedb (7799) Total: 1232.8 msec
Scheduler: waiting for cpu 23.6 msec 94.2 %
Reading directory content 15.8 msec 2.1 %
xfs_buf_iowait xfs_buf_iostart xfs_buf_read_flags 13.0 msec 3.2 %_do_buf xfs_da_read_buf xfs_da_node_lookup_int xfs
xfs_buf_iowait xfs_buf_iostart xfs_buf_read_flags 2.7 msec 0.6 %obp xfs_iread xfs_iget_core xfs_iget xfs_dir_looku
p_int xfs_lookup xfs_vn_lookup do_lookup


... ssh ... updatedb ... bash

Shown below is the output for the prelink -a process while it was running. As you can see, there is nothing at the time the latency information was captured that shows a huge latency that we might try to optimize out of prelink. The top latency events are all for disk IO, which is to be expected because IO calls are the main thing that prelink does that it might have to wait for the completion of. On the system used for this article, I found that processes could be "waiting for cpu" for anywhere in the range of 10 to about 40 msec, and where a process landed in this range was rather random.

Process prelink (7965) Total: 288.0 msec
Scheduler: waiting for cpu 42.2 msec 30.0 %
Writing a page to disk 18.2 msec 52.6 %
Walking directory tree 17.9 msec 10.4 %
Pagecache sync readahead 12.1 msec 4.2 %
EXT3: Waiting for journal access 5.2 msec 1.8 %
Reading from a pipe 2.9 msec 1.0 %

Sometimes finding out what kernel function has been called when latency occurs can be the piece of information that allows you to work out which part of the system or network is causing your application not to run smoothly. There has been a recent burst of "top"-like tools, such as xrestop for X11 resources and the well-known PowerTOP tool. LatencyTOP takes its place alongside these tools, letting you see what is giving you the most latency on your system.


  • System Administration
  • Tools & Utilities
Click Here!