April 10, 2008

How much can you improve network throughput with a high-end NIC?

Author: Ben Martin

What sort of impact can you expect from switching a machine from the Gigabit Ethernet NIC that come on its motherboard to a higher-end Intel desktop NIC? I benchmarked two common gigabit NICs found on motherboards against two Intel PCIe desktop gigabit NICs, targeting the specific purpose of accessing an NFS share over the network. The short version: throughput for sequential read/write operations didn't improve much, but latency was much better, allowing anything that needs a network round trip, like create, delete, and seek, to work much faster.

The two machines I used for testing were an AMD X2 4200 and an Intel Q6600 quad core CPU on a p35 motherboard. The AMD machine uses the Nvidia CK804 Ethernet Controller (rev a3) with the forcedeth driver, while the Intel machine has a Marvell Technology Group Ltd. 88E8056 PCI-E Gigabit Ethernet Controller (rev 12) driven by the sky2 driver.

The non-motherboard NICS are two Intel Pro/1000 PT gigabit PCIe NICs. Unless otherwise specified, I performed my tests with a DLink DGS-1008D gigabit switch between the two computers. Apart from the two machines being tested, the switch was not under additional load. I performed some Intel NIC tests without the switch; latency was about 10-20% better without the switch but bandwidth was similar.

I performed benchmarks using the lmbench (version 3.0-a9), fio (version 1.18) and bonnie++ (1.03) tools. lmbench provides many micro benchmarks; the most interesting for networks are bw_tcp, which measures network bandwidth, and the lat_tcp and lat_udp, which measure network latency for TCP and UDP communications respectively. I used fio and bonnie++ to measure performance when accessing a filesystem that is stored on a RAID-5 which is shared using NFS. I used fio mainly to see what difference the change of NICs makes to some typical filesystem access patterns on an NFS share.

To get an impression of the maximum values possible for the lmbench tests I first ran the tests against localhost on both machines.

AMD X2 4200 Intel Q6600 bw_tcp (MB/sec) lat_tcp (microseconds) lat_udp (microseconds)
766 1298
31.8 29.8
31.9 30.2

When communicating over the motherboard NICs, the bw_tcp scored 109.43Mbps, the TCP latency test scored 1,459 microseconds, and UDP latency came in at 1,129 microseconds. With the Intel NICs at both ends, the bw_tcp scored 87.47Mbps, TCP latency came in at 121 microseconds, and UDP latency was 100 microseconds. The network latency improvement was the most surprising -- so much so that I reverted to using the onboard NICs to verify the results again. I could not get the two Intel NICs to match the bw_tcp for the motherboard NICs by enabling jumbo frames, or changing the e1000 module parameters: InterruptThrottleRate, RxDescriptors or TxDescriptors. Also, building the driver e1000-7.6.15.4 from Intel's Web site did not result in a noticeable boost to the 87.5Mbps result for the bw_tcp test on the two Intel NIC network. In short, although the latency went down by an order of magnitude with the Intel NICs, I could not find a way to bring the maximum throughput back up to the level achieved with the onboard NICs. I am not sure how to explain this issue with the Intel NICs.

I used the fio benchmark to test sequential reads of a 1,024MB file, random reads on a 128MB file, and random read/write activity on a 512MB file. The results show a gain in performance for random reads and for writing for the two Intel NICs, likely due to the improved latency offered by the Intel NICs. For random reads the minimum throughput for the Intel NICs is about twice the minimum throughput for the motherboard NICs, but the Intel NICs scoring a much lower minimum throughput score the random reads and writes.

NIC Motherboard NICs Intel Pro/1000 PT Results in Mbps min max aggrb min max aggrb Sequential Reads Random Reads Random Reads and Writes
77.1 113.8 102 96.4 102.8 99.6
6.7 16.7 13.5 15.6 17.4 16.2
13/0 15.3/15.3 5.5/5.5 8/0 16.1/15.4 5.5/7.1

The last test uses the bonnie++ filesystem benchmark on the RAID-5 NFS share. For the motherboard NICs, the benchmark took 14 minutes and 40 seconds, while the two Intel NICs completed the test in a little over 11.5 minutes. Notice that the sequential output/input figures are similar between the two network configurations. I think that the reduced latency of the Intel NICs helped it produce more seek, create, read, and delete operations per second. The differences in latency produce a noticeable overall difference in performance, shaving three minutes off a 15-minute run time.

Bonnie++ on a filesystem on RAID-5 over NFS using motherboard NICs.

Version 1.03 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
alkid 4G 42652 80 41138 5 25811 7 58229 95 102953 10 1479 6
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 190 1 1021 2 175 0 196 1 1051 1 176 0

Bonnie++ on a filesystem on RAID-5 over NFS using two Intel Pro/1000 PT NICs

Version 1.03 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
alkid 4G 44477 80 40783 5 25534 7 58719 96 97800 9 1708 2
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 359 2 6139 11 303 1 358 1 6634 12 296 0

When you already have one or two gigabit NICs on your motherboard, it can be tough to justify spending around $50 per machine to add another NIC that runs at gigabit speeds. If you have a dual or quad core CPU in your machine, you are not likely to be overly concerned about losing a little processor time when under heavy network load. However, if you have a file server with a nice fast RAID that is shared over NFS and you are a heavy user of that filesystem, some extra NICs might be a good investment. As the benchmarks show, you shouldn't really expect more bandwidth for single sequential bulk transfers, but some operations, such as file creation, deletion, and seeking, can be noticeably faster, probably due to the lower latency of the Intel NICs.

Categories:

  • Desktop Hardware
  • Networking
Click Here!