Benchmarking NFSv3 vs. NFSv4 file operation performance

NFS version 4, published in April 2003, introduced stateful client-server interaction and “file delegation,” which allows a client to gain temporary exclusive access to a file on a server. NFSv4 brings security improvements such as RPCSEC_GSS, the ability to send multiple operations to the server at once, new file attributes, replication, client side caching, and improved file locking. Although there are a number of improvements in NFSv4 over previous versions, this article investigates just one of them — performance.

One issue with migrating to NFSv4 is that all of the filesystems you export have to be located under a single top-level exported directory. This means you have to change your /etc/exports file and also use Linux bind mounts to mount the filesystems you wish to export under your single top-level NFSv4 exported directory. Because the manner in which filesystems are exported in NFSv4 requires fairly large changes to system configuration, many folks might not have upgraded from NFSv3. This administration work is covered in other articles. This article provides performance benchmarks of NFSv3 against NFSv4 so you can get an idea of whether your network filesystem performance will be better after the migration.

I ran these performance tests using an Intel Q6600-based server with 8GB of RAM. The client was an AMD X2 with 2GB of RAM. Both machines were using Intel gigabit PCIe EXPI9300PT NICs, and the network between the two machines had virtually zero traffic on it for the duration of the benchmarks. The NICs provide a very low latency network, as described in a past article. While testing performance for this article I ran each benchmark multiple times to ensure performance was repeatable. The difference in RAM between the two machines changes how Bonnie++ is run by default. On the server, I ran the test using 16GB files, and on the client, 4GB files. Both machines were running 64-bit Fedora 9.

The filesystem exported from the server was an ext3 filesystem created on a RAID-5 over three 500GB hard disks. The exported filesystem was 60GB in size. The stripe_cache_size was 16384, meaning that for a three-disk RAID array, 192MB of RAM was used to cache pages at the RAID level. Default cache sizes for distributions might be in the 3-4MB range for the same RAID array. Using a larger cache directly improves write performance of the RAID. I also ran benchmarks locally on the server without using NFS to get an idea of the theoretical maximum performance NFS could achieve.

Some readers may point out that RAID-5 is not a desirable configuration, and certainly running it on only three disks is not a typical configuration. However, the relative performance of NFSv3 to NFSv4 is our main point of interest. I used a three disk RAID-5 because it had a filesystem that could be recreated for the benchmark. Recreation of the filesystem removes factors such as file fragmentation that can adversely effect performance.

I tested NFSv3 with and without the async option. The async option allows the NFS server to respond to a write request before it is actually on disk. The NFS protocol normally requires the server to ensure data has been writen to storage successfully before replying to the client. Depending on your needs, you might be running mounts with the async option on some filesystems for the performance improvement it offers, though you should be aware of what async implies for data integrity, in particular, potential undetectable data loss if the NFS server crashes.

The table below shows the Bonnie++ input, output, and seek benchmarks for the various NFS version 3 and 4 mounted filesystems as well as the benchmark that was run on the server. As expected, the reading performance is almost identical whether or not you are using the async option. You can perform more than five times the number of “seeks” over NFS when using the async option, presumably because the server can avoid actually performing some of them because a subsequent seek is issued before the initial seek was completed. Unfortunately the block sequential output for NFSv4 is not any better than for NFSv3. Without using the async option, output was about 50Mbps, whereas the local filesystem was capable of performing at 91Mbps. When using the async option, sequential block output came much closer to local disk speeds over the NFS mount.

Configuration	Sequential Output						Sequential Input				Random
	Per Char		Block		Rewrite		Per Char		Block		Seeks
	K/sec	% CPU	K/sec	% CPU	K/sec	% CPU	K/sec	% CPU	K/sec	% CPU	/sec	% CPU
local filesystem	62340	94	91939	22	45533	19	43046	69	109356	32	239.2	0
NFSv3 noatime,nfsvers=3	50129	86	47700	6	35942	8	52871	96	107516	11	1704	4
NFSv3 noatime,nfsvers=3,async	59287	96	83729	10	48880	12	52824	95	107582	10	9147	30
NFSv4 noatime	49864	86	49548	5	34046	8	52990	95	108091	10	1649	4
NFSv4 noatime,async	58569	96	85796	10	49146	10	52856	95	108247	11	9135	21

The table below shows the Bonnie++ benchmarks for file creation, read, and deletion. Notice that the async option has a tremendous impact on file creation and deletion.

Configuration	Sequential Create						Random Create
	Create		Read		Delete		Create		Read		Delete
	/sec	% CPU	/sec	% CPU	/sec	% CPU	/sec	% CPU	/sec	% CPU	/sec	% CPU
NFSv3 noatime,nfsvers=3	186	0	6122	10	182	0	186	0	6604	10	181	0
NFSv3 noatime,nfsvers=3,async	3031	10	8789	11	3078	9	2921	11	11271	13	3069	9
NFSv4 noatime	98	0	6005	13	193	0	93	0	6520	11	192	0
NFSv4 noatime,async	1314	8	7155	13	5350	12	1298	8	7537	12	5060	9

To test more day-to-day performance I extracted the linux-2.6.25.4.tar uncompressed Linux kernel source tarball and then deleted the extracted sources. Note that the original source tarball was not compressed in order to ensure that the CPU of the client was not slowing down extraction.

Configuration	Find (m:ss)	Remove (m:ss)
local filesystem	0:01	0:03
NFSv3 noatime,nfsvers=3	9:44	2:36
NFSv3 noatime,nfsvers=3,async	0:31	0:10
NFSv4 noatime	9:52	2:27
NFSv4 noatime,async	0:40	0:08

Wrap up

These tests show no clear performance advantage to moving from NFSv3 to NFSv4.

NFSv4 file creation is actually about half the speed of file creation over NFSv3, but NFSv4 can delete files quicker than NFSv3. By far the largest speed gains come from running with the async option on, though using this can lead to issues if the NFS server crashes or is rebooted.

Wrap up

RELATED ARTICLESMORE FROM AUTHOR

How to Deploy Lightweight Language Models on Embedded Linux with LiteLLM

Automating Compliance Management with UTMStack’s Open Source SIEM & XDR

Using OpenTelemetry and the OTel Collector for Logs, Metrics, and Traces

Xen 4.19 is released

Advancing Xen on RISC-V: key updates

RELATED ARTICLES MORE FROM AUTHOR