Linux.com

Feature: Desktop Hardware

How much can you improve network throughput with a high-end NIC?

By Ben Martin on April 10, 2008 (4:00:00 PM)

Share    Print    Comments   

What sort of impact can you expect from switching a machine from the Gigabit Ethernet NIC that come on its motherboard to a higher-end Intel desktop NIC? I benchmarked two common gigabit NICs found on motherboards against two Intel PCIe desktop gigabit NICs, targeting the specific purpose of accessing an NFS share over the network. The short version: throughput for sequential read/write operations didn't improve much, but latency was much better, allowing anything that needs a network round trip, like create, delete, and seek, to work much faster.

The two machines I used for testing were an AMD X2 4200 and an Intel Q6600 quad core CPU on a p35 motherboard. The AMD machine uses the Nvidia CK804 Ethernet Controller (rev a3) with the forcedeth driver, while the Intel machine has a Marvell Technology Group Ltd. 88E8056 PCI-E Gigabit Ethernet Controller (rev 12) driven by the sky2 driver.

The non-motherboard NICS are two Intel Pro/1000 PT gigabit PCIe NICs. Unless otherwise specified, I performed my tests with a DLink DGS-1008D gigabit switch between the two computers. Apart from the two machines being tested, the switch was not under additional load. I performed some Intel NIC tests without the switch; latency was about 10-20% better without the switch but bandwidth was similar.

I performed benchmarks using the lmbench (version 3.0-a9), fio (version 1.18) and bonnie++ (1.03) tools. lmbench provides many micro benchmarks; the most interesting for networks are bw_tcp, which measures network bandwidth, and the lat_tcp and lat_udp, which measure network latency for TCP and UDP communications respectively. I used fio and bonnie++ to measure performance when accessing a filesystem that is stored on a RAID-5 which is shared using NFS. I used fio mainly to see what difference the change of NICs makes to some typical filesystem access patterns on an NFS share.

To get an impression of the maximum values possible for the lmbench tests I first ran the tests against localhost on both machines.

lmbench network micro benchmarks on localhost for each machine
AMD X2 4200 Intel Q6600
bw_tcp (MB/sec)7661298
lat_tcp (microseconds)31.829.8
lat_udp (microseconds)31.930.2

When communicating over the motherboard NICs, the bw_tcp scored 109.43Mbps, the TCP latency test scored 1,459 microseconds, and UDP latency came in at 1,129 microseconds. With the Intel NICs at both ends, the bw_tcp scored 87.47Mbps, TCP latency came in at 121 microseconds, and UDP latency was 100 microseconds. The network latency improvement was the most surprising -- so much so that I reverted to using the onboard NICs to verify the results again. I could not get the two Intel NICs to match the bw_tcp for the motherboard NICs by enabling jumbo frames, or changing the e1000 module parameters: InterruptThrottleRate, RxDescriptors or TxDescriptors. Also, building the driver e1000-7.6.15.4 from Intel's Web site did not result in a noticeable boost to the 87.5Mbps result for the bw_tcp test on the two Intel NIC network. In short, although the latency went down by an order of magnitude with the Intel NICs, I could not find a way to bring the maximum throughput back up to the level achieved with the onboard NICs. I am not sure how to explain this issue with the Intel NICs.

I used the fio benchmark to test sequential reads of a 1,024MB file, random reads on a 128MB file, and random read/write activity on a 512MB file. The results show a gain in performance for random reads and for writing for the two Intel NICs, likely due to the improved latency offered by the Intel NICs. For random reads the minimum throughput for the Intel NICs is about twice the minimum throughput for the motherboard NICs, but the Intel NICs scoring a much lower minimum throughput score the random reads and writes.

fio benchmark on a filesystem stored on a three disk RAID-5 and shared over NFS
NIC Motherboard NICs Intel Pro/1000 PT
Results in Mbps minmaxaggrbminmaxaggrb
Sequential Reads 77.1113.8 102 96.4102.899.6
Random Reads 6.7 16.7 13.515.617.416.2
Random Reads and Writes13/015.3/15.35.5/5.5 8/016.1/15.45.5/7.1

The last test uses the bonnie++ filesystem benchmark on the RAID-5 NFS share. For the motherboard NICs, the benchmark took 14 minutes and 40 seconds, while the two Intel NICs completed the test in a little over 11.5 minutes. Notice that the sequential output/input figures are similar between the two network configurations. I think that the reduced latency of the Intel NICs helped it produce more seek, create, read, and delete operations per second. The differences in latency produce a noticeable overall difference in performance, shaving three minutes off a 15-minute run time.

Bonnie++ on a filesystem on RAID-5 over NFS using motherboard NICs. Version 1.03 ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP alkid 4G 42652 80 41138 5 25811 7 58229 95 102953 10 1479 6 ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 190 1 1021 2 175 0 196 1 1051 1 176 0
Bonnie++ on a filesystem on RAID-5 over NFS using two Intel Pro/1000 PT NICs Version 1.03 ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP alkid 4G 44477 80 40783 5 25534 7 58719 96 97800 9 1708 2 ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 359 2 6139 11 303 1 358 1 6634 12 296 0

When you already have one or two gigabit NICs on your motherboard, it can be tough to justify spending around $50 per machine to add another NIC that runs at gigabit speeds. If you have a dual or quad core CPU in your machine, you are not likely to be overly concerned about losing a little processor time when under heavy network load. However, if you have a file server with a nice fast RAID that is shared over NFS and you are a heavy user of that filesystem, some extra NICs might be a good investment. As the benchmarks show, you shouldn't really expect more bandwidth for single sequential bulk transfers, but some operations, such as file creation, deletion, and seeking, can be noticeably faster, probably due to the lower latency of the Intel NICs.

Ben Martin has been working on filesystems for more than 10 years. He completed his Ph.D. and now offers consulting services focused on libferris, filesystems, and search solutions.

Share    Print    Comments   

Comments

on How much can you improve network throughput with a high-end NIC?

Note: Comments are owned by the poster. We are not responsible for their content.

How much can you improve network throughput with a high-end NIC?

Posted by: Anonymous [ip: 64.162.14.2] on April 10, 2008 05:00 PM
We found through our own internal testing with OS'es Redhat, SCO Unixware and WIndows Server 2003 the tcpip communcation protocol stack affected the speeds tremendously. Unixware and Linux were much faster than windows overall with a small gain in performance from exchanging NICs.

#

How much can you improve network throughput with a high-end NIC?

Posted by: Anonymous [ip: 84.179.165.155] on April 10, 2008 08:05 PM
what about hardware offloading capabilities? did you chek the impact on performance?

ie CK804:

<<ethtool -k eth0>>
Offload parameters for eth0:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: on
udp fragmentation offload: off
generic segmentation offload: off

<<ethtool -i eth0>>
driver: forcedeth
version: 0.61

i think this will make a big difference!!

#

Re: How much can you improve network throughput with a high-end NIC?

Posted by: monkeyiq on April 11, 2008 03:02 AM
OK, bringing up the onboard NICs again and comparing, both the onboard and Intel NICs were running with the following:

rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: on
udp fragmentation offload: off
generic segmentation offload: off


For the 88E8056 ethtool -k also give me this little charmer:

Cannot get device udp large send offload settings: Operation not supported


I'll have to check out how much things change when enabling ufo and gso for the Intel NICs. Though I can't really compare the hardware unless I get the Marvell NIC to be happy with UDP large send offload.

#

How much can you improve network throughput with a high-end NIC?

Posted by: Anonymous [ip: 84.179.165.155] on April 11, 2008 10:11 AM
THX i will stay "tuned" ;)

#

How much can you improve network throughput with a high-end NIC?

Posted by: Anonymous [ip: 209.33.59.228] on April 11, 2008 03:07 PM
From the looks of the off-board Intel NICs' throughput they were PCI cards. The maximum speed that the PCI bus can reach is 133 MB/s. That is shared between ALL connected PCI devices so your speeds are about right for the Intel cards. The onboard cards are probably PCI as well but they are usually designed with a specially crafted PCI bus to reach the speeds that is close to the max speed of Gigabit Ethernet. In order to get a good result from off-board NICs you will need a PCI Express NIC and motherboard. Or you will need cards and a motherboard that support a 64-bit PCI bus at 133mhz. Checkout this wikipedia article for more info on the PCI bus: http://en.wikipedia.org/wiki/PCI-X

--Credomane

#

How much can you improve network throughput with a high-end NIC?

Posted by: Anonymous [ip: 209.33.59.228] on April 11, 2008 03:14 PM
Well I just reread your acticle. Seems I missed the PCIe from my first read through. Seems you really ARE using PCI Express. Now I'm clueless.

--Credomane

#

How much can you improve network throughput with a high-end NIC?

Posted by: Anonymous [ip: 216.49.154.158] on April 12, 2008 04:02 PM
Oh, I thought by a "high end NIC", you were going to mention 10 gigabit NICs with perhaps a transport offload engine. 1G NICs are not "high end" in my book.

#

How much can you improve network throughput with a high-end NIC?

Posted by: Anonymous [ip: 76.10.145.34] on April 14, 2008 03:17 PM
Mmm.. the latency numbers suggest that perhaps the onboard NICs were not using NAPI, whereas the Intel NICs were using NAPI. Dunno if that's true, but the numbers would all make sense if it were.

#

How much can you improve network throughput with a high-end NIC?

Posted by: Anonymous [ip: 76.24.161.232] on April 22, 2008 06:44 AM
At 100mbit I've always seen latency around 100-200ms with onboard ethernet. Unless onboard is worse with gigabit connections (doubtful), I think you're comparing the Intel cards against a lemon.

You're also comparing using two different motherboards. Try comparing each with the Intel to see if just one of them is misbehaving.

#

How much can you improve network throughput with a high-end NIC?

Posted by: Anonymous [ip: 78.42.139.214] on May 06, 2008 10:46 PM
If you're using more than two ports of that switch at the same time, the overall throughput will probably stabilize very soon. Many of those cheap Gigabit Switches have a very slow backplane. Why did you use the switch in the first place, if you only tested with two machines?

#

This story has been archived. Comments can no longer be posted.



 
Tableless layout Validate XHTML 1.0 Strict Validate CSS Powered by Xaraya