August 13, 2008

Benchmarking network performance with Network Pipemeter, LMbench, and nuttcp

Author: Ben Martin

Network latency and bandwidth are the two metrics most likely to be of interest when you benchmark a network. Even though most service and product advertising focuses on bandwidth, at times the latency can be a more important metric. Here's a look at three projects that include tools to test your network performance: nepim "network pipemeter," LMbench, and nuttcp.

For this article I built each utility from source on a 64-bit Fedora 9 machine. I used nepim version 0.51, LMbench version 3, and nuttcp version 5.5.5.

For testing, I used a network link with two Gigabit Ethernet network interface cards configured for network bonding. As you'll see from the results, however, something obviously is not functioning correctly because I was unable to achieve the 2 gigabit theoretical bandwidth. The nepim and nuttcp benchmarks below show that communication from the server is faster than sending data to the server, which might be an effect of the network interface bonding.


nepim is packaged for openSUSE 11 as a 1-Click install but is not in the Fedora or Ubuntu repositories. It requires the liboop library to be installed, and this library too is packaged for openSUSE but not Fedora and Ubuntu. You can install liboop using the ./configure; make; sudo make install procedure.

nepim does not use autotools to build. To compile, you change into the src directory and run make. I found that to compile nepim I had to include an additional define in the Makefile to avoid a duplicate definition of a data structure. The change is shown below, together with the installation, as there is no install target in the makefile.

$ cd src
$ vi Makefile
ifeq ($(PLATFORM),Linux)
LDFLAGS += -ldl
$ make
$ sudo install -m 555 nepim /usr/local/bin

If you invoke nepim without any command-line options it starts in server mode. In this mode nepim will listen on every interface on the system and accept both UDP and TCP connections. Running nepim with the -c option puts nepim into client mode and allows you to specify one or more servers that nepim should connect to and start benchmarking the network link.

Starting nepim on the server is shown below, together with the output from the server when a client connects. The lines beginning with "6 cur" are printed as the benchmark is being performed. The final lines showing the mean (average), minimum, and maximum figures for kilobits per second in both directions and packets sent and received per second are printed when the client has disconnected.

$ nepim
nepim - network pipemeter - version 0.51
server: tcp_read=32768 tcp_write=32768 udp_read=4096 udp_write=4096
3: TCP socket listening on ::,1234
6: TCP incoming: ::ffff:,38176
6: send=1 bit_rate=-1 pkt_rate=-1 interval=2 duration=10 delay=250000 ....
6: sweep write: floor=32768 ceil=32768
6: pmtud_mode=1 path_mtu=1500 mss=1448 tos=0 ttl=64 mcast_ttl=64 win_recv=87380 win_send=16384 sock_ka=1 nodelay=0

kbps_in kbps_out rcv/s snd/s
6 cur 8 273194.97 676940.88 2619.50 2583.00
6 cur 6 235308.55 722075.62 2435.50 2754.50
6 cur 4 223439.58 723386.38 2282.00 2759.50
6 cur 2 255724.64 702152.69 2346.00 2678.50
6 avg 0 246072.14 708041.75 2413.30 2701.10
6 min 0 223439.58 676940.88 2282.00 2583.00
6 max 0 273194.97 723386.38 2619.50 2759.50
write: errno=104: Connection reset by peer
write: connection lost on TCP socket 6
6: pmtud_mode=1 path_mtu=1500 mss=1448 tos=0 ttl=64 mcast_ttl=64 win_recv=603680 win_send=256360 sock_ka=1 nodelay=0

By default, traffic goes only from the server to the client after the client connects to the server. you can change this by using the -s option to have the client send to the server, or the -d option for communication in both directions. The session on a client that connects to the above server is shown below.

$ nepim -c -d
nepim - network pipemeter - version 0.51
client: tcp_read=32768 tcp_write=32768 write_floor=32768 write_ceil=32768 step=1
not a UNIX domain path: errno=2: No such file or directory
3: TCP socket connected to,1234
3: sending: hello server_send=1 bit_rate=-1 pkt_rate=-1 stat_interval=2 ...
3: greetings sent to,1234
3: pmtud_mode=1 path_mtu=1500 mss=1448 tos=0 ttl=64 mcast_ttl=1 win_recv=87380 win_send=16384 sock_ka=1 nodelay=0

kbps_in kbps_out rcv/s snd/s
3 cur 8 675722.31 273269.25 2696.00 1086.00
3 cur 6 719693.06 235371.25 3278.50 953.50
3 cur 4 725370.31 223025.72 3067.50 898.50
3 cur 2 700528.75 255723.53 2785.00 1019.00
3 avg 0 706910.69 246072.14 2943.30 986.20
3 min 0 675722.31 223025.72 2696.00 898.50
3 max 0 725370.31 273269.25 3278.50 1086.00
3: pmtud_mode=1 path_mtu=1500 mss=1448 tos=0 ttl=64 mcast_ttl=1 win_recv=1688544 win_send=99000 sock_ka=1 nodelay=0
nepim: no event sink registered
nepim: done

If you start the nepim server using -U /tmp/nepim-socket it will use local domain stream sockets instead of TCP/IP networking. You supply the path to the socket to the -c option on the client to connect to this local socket for benchmarking. This is useful if you want to know how fast nepim can possibly communicate without having the network card slow things down.

Shown below are the figures for nepim run against a local domain socket on an Intel Q6600 quad core CPU. The Q6600 can manage about 7 gigabits in both directions. The CPU was running at slightly over 50% capacity for the duration of the test, giving both the nepim client and server full use of a CPU core each.

kbps_in kbps_out rcv/s snd/s
3 cur 8 7100432.50 7105806.50 27203.50 27106.50
3 cur 6 7268335.50 7266631.50 27887.00 27720.00
3 cur 4 7105020.00 7108296.50 27196.00 27116.00
3 cur 2 7189823.50 7188644.00 27557.00 27422.50
3 avg 0 7154958.50 7156819.50 27413.10 27301.10
3 min 0 7100432.50 7105806.50 27196.00 27106.50
3 max 0 7268335.50 7266631.50 27887.00 27720.00

To run more than a single session at once, use the -n option when running the client and supply the number of connections you would like. When I used -n 2 in the local socket test, each stream achieved about 4 to 4.5 gigabits/second, so bandwidth was improved but not doubled.

When you run the nepim client with the -u option, it will use UDP instead of TCP for communications. The output for UDP includes statistics about the number of packets that were lost, as shown below.

$ nepim -c -d -u
kbps_in kbps_out rcv/s snd/s loss ooo LOST
3 0 0 cur 8 595738.62 808632.31 18180.50 24677.50 .0495 .0262 1894
3 0 0 cur 6 505872.38 868532.25 15438.00 26505.50 .0090 .0050 2174
3 0 0 cur 4 585842.69 825393.12 17878.50 25189.00 .0177 .0097 2817
3 0 0 cur 2 563150.88 872955.88 17186.00 26640.50 .0232 .0115 3633
3 0 0 avg 0 546350.69 866831.56 16673.30 26453.60 .0389 .0190 6749
3 0 0 min 0 505872.38 808632.31 15438.00 24677.50 .0090 .0050
3 0 0 max 0 595738.62 872955.88 18180.50 26640.50 .0495 .0262

nepim is great for seeing what the bandwidth is in both directions between two machines. Being able to test send, receive, and bidirectional throughput speeds lets you see if you are having issues in only one direction. The UDP tests can also show you how many packets are being lost so you can see whether connecting a switch between two hosts leads to more packets being lost or not.


Next we'll take a look at LMBench, which includes many tools to benchmark network, memory, filesystem, and other system components' performance. For this article I'll explore only the network-related benchmarks.

LMbench is packaged for Ubuntu but not for Fedora or openSUSE. It does not use autotools to build. Once you have run make to generate the executables, they can be executed directly from where they are created. The build procedure is shown below:

$ tar xzvf /.../lmbench3.tar.gz
$ cd ./lmbench*
$ make -k
$ cd ./bin/x86_64-linux-gnu
$ ls
bw_file_rd lat_connect lat_proc lib_mem.o loop_o
bw_mem lat_ctx lat_rpc lib_sched.o memsize
bw_mmap_rd lat_fcntl lat_select lib_stats.o mhz
bw_pipe lat_fifo lat_sem lib_tcp.o msleep
bw_tcp lat_fs lat_sig lib_timing.o par_mem
bw_unix lat_http lat_syscall lib_udp.o par_ops
disk lat_mem_rd lat_tcp lib_unix.o stream
enough lat_mmap lat_udp line timing_o
flushdisk lat_ops lat_unix lmbench.a tlb
getopt.o lat_pagefault lat_unix_connect lmdd
hello lat_pipe lib_debug.o lmhttp

To measure the TCP bandwidth between two hosts, start bw_tcp -s on the server and bw_tcp servername on the client. The client session is shown below:

$ ./bw_tcp
0.065536 88.32 MB/sec

Many benchmarks in the LMbench suite follow the same pattern as bw_tcp shown above. That is, you start a server by supplying -s as the sole argument and run clients by passing the host name or IP address of the server. Shown below are the TCP and UDP latency tests, as well as a test of the latency to simply complete a TCP/IP connection over the network. These clients offer very few options to allow you to experiment with different network queue sizes and other variables that might affect performance.

TCP latency using 685.9002 microseconds
$ ./lat_udp
UDP latency using 1378.2743 microseconds
$ ./lat_connect
TCP/IP connection cost to 185.5484 microseconds

The LMBench network tests have few options but do provide you with an easy way to measure your current network bandwidth and latency. When you are changing the kernel module options for your current NIC or replacing a NIC with a new one, LMBench provides a quick test to see how much your latency has improved.


Finally, we'll take a look at nuttcp, which includes many options for tweaking buffer lengths, nodelay options, and type of service fields to see what impact this has on your network performance. nuttcp can show either overall bandwidth or the bandwidth achieved in the last second.

nuttcp is available in the Fedora 9 repositories but not for openSUSE or Ubuntu. Build and installation is shown below:

tar xjvf nuttcp-5.5.5.tar.bz2
cd ./nuttcp*
cc -O3 -o nuttcp nuttcp-5.5.5.c
strip nuttcp
sudo install -m 555 nuttcp /usr/local/bin/

Start the server using nuttcp -S. The client can be invoked with many options, followed by the server host name(s) at the end of the command line. The below test prints the bandwidth every second (specified by the -i1 option) while the test is running and runs for 10 seconds before completing.

$ nuttcp -v -v -i1
nuttcp-t: v5.5.5: socket
nuttcp-t: buflen=65536, nstream=1, port=5001 tcp ->
nuttcp-t: time limit = 10.00 seconds
nuttcp-t: connect to with mss=1448
nuttcp-t: send window size = 8192, receive window size = 43690
nuttcp-r: v5.5.5: socket
nuttcp-r: buflen=65536, nstream=1, port=5001 tcp

nuttcp-r: interval reporting every 1.00 second
nuttcp-r: accept from
nuttcp-r: send window size = 8192, receive window size = 43690
85.3719 MB / 1.00 sec = 715.9765 Mbps
86.3684 MB / 1.00 sec = 724.5411 Mbps
85.9188 MB / 1.00 sec = 720.7551 Mbps
84.4201 MB / 1.00 sec = 708.2533 Mbps
87.7772 MB / 1.00 sec = 736.2222 Mbps
86.7372 MB / 1.00 sec = 727.5696 Mbps
91.4327 MB / 1.00 sec = 767.0191 Mbps
89.4166 MB / 1.00 sec = 750.2228 Mbps
85.4859 MB / 1.00 sec = 717.0937 Mbps
87.0377 MB / 1.00 sec = 729.9696 Mbps
nuttcp-t: 870.1633 MB in 10.00 real seconds = 89091.75 KB/sec = 729.8396 Mbps
nuttcp-t: 13923 I/O calls, msec/call = 0.74, calls/sec = 1392.10
nuttcp-t: 0.0user 22.3sys 0:10real 224% 0i+0d 0maxrss 0+3pf 16198+1383csw

nuttcp-r: 870.1633 MB in 10.00 real seconds = 89083.52 KB/sec = 729.7722 Mbps
nuttcp-r: 55254 I/O calls, msec/call = 0.19, calls/sec = 5524.09
nuttcp-r: 0.0user 6.7sys 0:10real 67% 0i+0d 0maxrss 0+20pf 62619+635csw

You can also run multiple streams at once; use -N3 to start three connections, for example. The -B option makes the client receive traffic only, while the -D option transmits only. The default is for communication in both directions.

$ nuttcp -v -v -N3 -B
nuttcp-t: v5.5.5: socket
nuttcp-t: buflen=65536, nstream=3, port=5001 tcp ->
nuttcp-t: time limit = 10.00 seconds
nuttcp-t: connect to with mss=1448
nuttcp-t: send window size = 8192, receive window size = 43690
nuttcp-t: 1239.8698 MB in 10.00 real seconds = 126944.75 KB/sec = 1039.9314 Mbps
nuttcp-t: 19838 I/O calls, msec/call = 0.52, calls/sec = 1983.52
nuttcp-t: 0.0user 41.2sys 0:10real 413% 0i+0d 0maxrss 0+3pf 4758+3081csw

nuttcp-r: v5.5.5: socket
nuttcp-r: buflen=65536, nstream=3, port=5001 tcp
nuttcp-r: accept from
nuttcp-r: send window size = 8192, receive window size = 43690
nuttcp-r: 1239.8698 MB in 10.00 real seconds = 126934.93 KB/sec = 1039.8509 Mbps
nuttcp-r: 29899 I/O calls, msec/call = 0.34, calls/sec = 2989.25
nuttcp-r: 0.0user 8.5sys 0:10real 86% 0i+0d 0maxrss 0+18pf 12519+1847csw

$ nuttcp -v -v -N3 -D
nuttcp-r: v5.5.5: socket
nuttcp-r: buflen=65536, nstream=3, port=5001 tcp
nuttcp-r: accept from
nuttcp-r: send window size = 8192, receive window size = 43690
nuttcp-r: 806.2317 MB in 10.00 real seconds = 82545.65 KB/sec = 676.2140 Mbps
nuttcp-r: 67104 I/O calls, msec/call = 0.15, calls/sec = 6709.39
nuttcp-r: 0.0user 5.7sys 0:10real 57% 0i+0d 0maxrss 0+18pf 73018+378csw

nuttcp provides similar options to nepim and is heavily focused on measuring the network bandwidth between hosts. Using -i1 with nuttcp, and by default with nepim, you see the bandwidth statistics printed every second while the test is taking place. The nuttcp man page shows many options for the type of service and buffer sizes that you can explicitly set when running nuttcp so you can see if your particular hardware and drivers do not perform well in certain configurations. Running nepim --help will show many more options for configuring the buffers, window sizes, and TCP options.

It is a matter of personal convenience whether you should use nuttcp or nepim. Since nepim is packaged for openSUSE and nuttcp is packaged for Fedora, it might boil down to what distribution you are running as to which of these two tools to use.

Both nepim and nuttcp provide options for setting the size of network packet queues and other more advanced options, such as the TCP maximum segment size, in order to improve the network performance by changing the software setup at each end. Meanwhile, the LMbench tests are quick to run and provide useful insight into your available bandwidth and latencies on your network link.


  • Networking
  • System Administration